The appearance of new Hadoop-based architectures in 2011 has enabled the processing costs for large quantities of data to be reduced drastically, thus enabling companies to use to its full potential all the information available including that coming from unstructured sources. In this scenario, Exprivia has proposed its solutions based on its experience in the "Big Data" world that dates back to when buzzwords were not widespread. This experience has produced "Deep Knowledge", whose algorithms were used to create "Big Knowledge", based on Hadoop architecture.
Exprivia projects are characterized by the innovative management of structured, unstructured or hybrid data with regard to data refinery. The use of the Exprivia proprietary solutions BigKnowledge and Normasearch is particularly significant.
Management of structured and unstructured data
In some cases, the main need is to manage large volumes of mainly structured data. Companies can take advantage of the saving in terms of hardware and the possibility of conducting analyses and performing calculations on the fly, which are normally onerous with standard architectures.
An example of this is a statistical dashboard that offers interactive charts based on a Big Data search engine (ClouderaSearch) to guarantee rapid analyses on independent datasets that depend on the analyst (and thus cannot be calculated in advance). The use of a search engine (and not a standard database) speeds up the entire process significantly.
Another example is a back end system for receipt data, developed on a big data platform (Cloudera), which consists of two principal components:
- printed billing recognition engine. Starting from a photograph of the invoice/receipt, through pattern matching techniques, the system identifies information such as: shop, date and time of payment, total value and single products. A database using HIVE is fed with this data;
- analytical system that, starting from the data collected, checks the penetration of a product on a geolocated demographic profile (the data concerning the profile are collected by the app using a social login or standard registration), the comparisons of sale between different products within the same large-scale retail chain or between competing large-scale retail chains. The information is made available through data access services (in which case, the customer integrates them in the business intelligence systems) or reports based on Pentaho (open source BI solution). The entire system runs under Amazon Cloud.
In other cases, the prevalent amount of unstructured or hybrid data requires the information retrieval capacity of Big Knowledge in searching for and selecting all the necessary information in the open web and internal documents (pdf, word, e-mail, etc.), reducing the information overload through advanced semantic analysis techniques. The artificial intelligence of BK is used by the system to create rules based on the information processed and thus automatically search for the information on the web or silos of documents found. The next step is to structure this information in such a way as to make it available to the customer in a simple and user-friendly way.
This design versatility gives rise to many interesting solutions:
- Normasearch, for searching the open web for news the selection of which is controlled by rules generated by the system itself, duly instructed;
- DFA for pre-trade reconstruction. The system analyses all the exchanges of information about the deals (voice calls, chats and e-mail) so as to be able to certify that every deal has been closed in compliance with the internal procedures;
- Threat Intelligence, which uses the information retrieval component of Big Knowledge to search for information about new cyber attacks not only related to security regulations, both on the open web and on the deep web. In this case, the data is classified using Expert System’s "Cogito" product;
- Competitive Intelligence, to search a series of profiled sites for information about tenders. Items useful for analysis (contractor, price, type of construction, duration of contract, etc.) are structured on a traditional DB, thus significantly reducing the time required by the analysts to retrieve this data;
- Asset Protection to monitor all the exchanges of information about documents protected by non-disclosure agreements. The system creates a graphical display of all exchanges for forensic purposes, as it can determine whether the issue is protected and the way in which it is protected.
The management of data in all forms is supplemented with the Data Lake construction activity, that is, the fundamental Big Data infrastructure, which contains all the detailed information available about the customers, which will be put at the disposal of the in-house data scientists for subsequent processing.