eXGraphs

A tool for generating and storing property graphs out of XML resources

What is eXGraphs?

eXGraphs is a generic web application which extracts property graphs from XML using XPath syntax. The tool can transform any kind of XML resource and extracts information selected by the user. It then creates nodes and relations and returns them.

Use Case

eXGraphs was developed within the context of the project “Between Theology, Early Modern Science and Political Correspondence: The Socinian Exchange of Letters.” This digital scholarly edition of letters aims to provide access a closely interrelated network of subjects and agents. Index entries (persons, places, terms and celestial phenomena) are currently encoded in XML, but will be transferred to a Neo4j graph database. (I.e. this is the use case of permanently shifting from one leading system to another.) This affords the advantage that the complex relations between the entities from the indices can be freely modelled, and it is not necessary to work in a strictly hierarchical ontology.

The project Regesta Imperii, which has been established long before the advent of graph technologies, is an example of the use of graph extraction for analyses, without changing the leading data storage system. A demonstration shows the analytical potential of graph structures extracted from XML indices.

When will eXGraphs be released?

We plan to release a beta version of eXGraphs in the fourth quarter of 2020.

Screenshots

Neo4j Database Schema [Fig.1] Example of a Neo4j database schema filled with content by exgraphs. Neo4j Database Schema [Fig.2] Example of a graph export of <correspDesc> metadata and index entries connected to a letter.

Quickstart – How does eXGraphs work?

eXGraphs crawls XML resources or repositories and creates nodes and relations based on the XML. The results can be stored directly in a graph database or as an text export in various formats.

Nodes

XML sources can be searched for entities specified by the user using XPath syntax to create nodes. For this, every node and every XML resource need at least one unique identifier.

Relations

After the creation of nodes, relations between them can be added by matching associated nodes. For that, nodes are identified by the unique identifiers in the XML resources.

How can I use eXGraphs?

Configuration Template

<eXGraphs>
    <configuration>
        <neo4j>
            <host>###MY_NEO4J_HOST_ADDRESS###</host>
            <username>###MY_NEO4J_USERNAME###</username>
            <password>###MY_NEO4J_PASSWORD###</password>
        </neo4j>

        <namespaces>
            <ns prefix="###MY_NAMESPACE_PREFIX###" uri=""###URI_TO_NAMESPACE###"></ns>
            <ns ...
        </namespaces>
    </configuration>

    <nodes>
        <node id="A"  method="create">
            <label>###LABEL_OF_THE_NODE###<</label>
            <baseUri>###XPATH_TO_ENTITY_IN_XML###</baseUri>
            <attributes>
                <attribute name="###NAME_OF_UNIQURE_IDENTIFIER_IN_XML###">###XPATH_TO_ATTRIBUTE_FROM_baseURI###</attribute>
                <attribute ...
            </attributes>

            <nodes>

                <node id="B" method="match">
                    <label>###LABEL_OF_ENTITY_IN_XML###<</label>
                    <baseUri>###XPATH_TO_ENTITY_IN_XML###</baseUri>
                    <attributes>
                        <attribute name="###NAME_OF_UNIQURE_IDENTIFIER_IN_XML###">###XPATH_TO_ATTRIBUTE_FROM_baseURI###</attribute>
                        <attribute ...
                    </attributes>
                </node>

                <node ...

            </nodes>
            <relations>

                <relation from="A" to="B">
                    <label>###NAME_OF_RELATION###</label>
                </relation>

                <relation ...

            </relations>
        </node>
    </nodes>

    <collection>
        <resource type="collection" uri="###URI_TO_XML_RESOURCE(S)###" />
        <resource ...
    </collection>

</eXGraphs>

How can I provide my XML resources?

eXGraphs can collect XML resources in multiple ways, given by the argument provided as the value of <resource type="". Possible values are:

Value Result
collection Recursively collects all XML resources in an XML collection, e.g. via URI to a REST API.
url Collects an single XML resource via URL.
xml Collects the data from an XML structure given as content of the <resource ..> tag.
folder Collects XML files from a local folder (only works when eXGraphs is installed locally).

API

Users can submit configuration files via REST API to https://exgraphs.lod.academy/post/{returnValue}. The {returnValue} can be one of the following:

Value Result
neo4j Connects to the Neo4j database given in the configuration file and automatically creates all extracted nodes and relations directly in the database.
query Returns the cypher query used to create nodes and relations in Neo4j manually.
json Returns nodes and relations in JSON syntax.