The Infinit.e platform features a robust set of data harvesters that give it data extraction and transformation (enrichment) capabilities. Infinit.e's harvesters are designed to consume data from a variety of sources and media types including:
- Web based content accessible via URL including:
- Static HTML content;
- RSS and ATOM based news feeds;
- Restful web services interfaces.
- Traditional relational database management systems (RDBMS) via Java Database Connectivity (JDBC) drivers;
- Files located on local and network attached storage devices.
Infinit.e's data harvesting process can be summed up in the following steps:
- Extract metadata (and full text if relevant) from a data source and create a Document object
- Enrich source metadata by extracting entities, events, geographic data, etc. using one or more of the following options:
- Structured Analysis Handler
- Unstructured Analysis Handler
- Standard Extraction code
- 3rd Party Extraction Libraries
- Update entity counts/aggregates (generic processing - statistics)
- Store document object in Infinit.e's MongoDB data store
- Index the updated data store using Elasticsearch (generic processing - aggregation)
This page, and the resources linked from it, will help you get started with the process of creating your own sources.
Important Note: All registered users of the IKANOW Developer API who have been issued an API Key can create sources however the following restrictions on adding new sources apply:
- New sources must be approved by a system administrator before they can start harvesting data. Factors that the system administrators will consider when reviewing and approving new sources include:
- Is there a source that already exists that harvests the same data?
- Has the source been fully tested and shown to function properly?
- Does the new source violate the API terms of service related to data ownership, appropriateness, etc.?
- All new sources will be added to public communities and will be visible to all developers.
Creating a Source
The following resources describe the various steps and JSON objects involved in creating new sources:
- Specifying a data source
How to specify the mechanics required to extract data from a source system:
- Using the Structured Analysis Harvester
Introduction to using the Structured Analysis Harvester to create or extract entities, events, and geo-data from structured data sources like database records and XML files.
- Using the Unstructured Analysis Harvester
Source Reference Documents
The following links provide detailed information regarding the JSON objects that make up a Source document and the individual fields within each object to support the introductory materials above.
Sample Source Documents
The following sample source documents are provided as an aid to learning how to create your own sources: