The matchmaker for Data
A data register is a smart way to organize all data available in a company. The register shows what data is available, who owns it and what the context is in which the data was produced. Dexes offers a data register with intelligent search. The data register cannot only store references to data, but also to apps, models and other information related to data. A register does not hold the data itself. It is like a phonebook that only stores metadata that describes the data. The register’s main purpose is to find data and match it with the users’ needs and context. This article describes how the Dexes data register and matchmaking works.
One data register to collect them all
A data register is an online register of datasets. It holds information only, i.e. metadata, about datasets from all kinds of sources. A data register does not contain data itself, but only the metadata. This includes for example the owner, title, description, keywords, formats of the data, language, classification etc. In this way the data register can act like a phonebook to quickly find data and its resources that you are looking for. Like a phonebook, the register points you to the place where the data is actually stored and can be found.
The Dexes data register collects information about datasets in several ways:
- A user/organization can register datasets by hand by filling out a form on the Dexes website
- A user/organization can automate the publishing of his/her datasets via the dataset API which Dexes makes available
- The register can synchronize with other (parts of) registers around the world (assuming those registers present an API of sorts)
Once a data register contains datasets, its role is not different from solving a regular information retrieval problem. Its role is to satisfy the information need of a user. In this case a user is searching for datasets and the register needs to serve the relevant datasets, based on only metadata.
Before the user is able to search for datasets all metadata needs to be indexed in a search engine. The Dexes search engine includes well known tricks while indexing, e.g. removing stop words, adding synonyms, transforming all characters to lowercase, transliterating special characters to their base character, e.g. é to e and applying stemming, which tries to reduce a word to its word stem. These tricks make it possible to search for e.g. “belgium vehicle taxes” and find datasets about “België” (Belgium in Dutch), cars (synonym for/type of vehicle) and tax (singular/stem of taxes), which are clearly relevant for the given query.
Moreover, we can leverage additional functionality for the search engine, like giving suggestions while typing, spell checking while searching, highlighting matched text in the results and presenting search facets for quickly filtering for e.g. a specific type of format of the data.
More than datasets
The Dexes data register offers more than only datasets. It can enable users to make data-requests, to form groups of similar datasets, and to register applications that use registered datasets. These additional entities can also be immediately indexed, using the already powerfully configured search engine.
Find a dataset to relate to
So, a register can offer a traditional search box. But, the Dexes data register offers more than that. It also offers an exploration of the data, e.g. present the user with related entities while looking at a single relevant entity. And remember that an entity can be a dataset, data request, group or application.
So far, we have talked about getting users with an information need to relevant datasets. But why are there datasets registered in the first place? Users/organizations want to share/advertise their datasets, with specific conditions how data can be shared. Apart from conditions, a register needs to support both data-owner and user in providing safe and secure access to the data. In another article we explain more about how the Dexes data register handles conditions for sharing data and closing a datadeal.
Data matchmaking platform
In conclusion, we see the Dexes data register as a matchmaking platform. It provides a platform in which both data providers and data consumers can come together to exchange data. We give those that wish to exchange data all the tools they need and want to do so while similarly providing data consumers similar tools to enable them to find what they are looking for. Not only it matches data-owners that want to share datasets with users with an information need, but it encourages data-owners and users to match their profiles and to make use of existing data more than ever. In this way the data register offers a lot more than a phonebook. It is the dating platform for data-owners and users that are looking for data. Dexes offers matchmaking for data as part of the Data catalogue solution.