ANAC – Text mining on BDNCP and Gazzetta Ufficiale

A study into the matching of BDNCP data and the Gazzetta Ufficiale to monitor data quality and public procurement publications.

Through advanced data mining techniques, the project will match the BDNCP data against the tender notices published in the Gazzetta Ufficiale to improve the quality of data on public procurement and the transparency of tenders.


  1. Improving the quality and completeness of BDNCP records, by acquiring missing data concerning tender outcomes from the corresponding Gazzetta Ufficiale notices.
  2. Improving the transparency of tenders by evaluating the compliance of Gazzetta Ufficiale with the duty of publication.


The project will benefit both the transparency and monitoring of public procurement since it will ensure that the calls to tender have been published in the Gazzetta Ufficiale and will enable the acquisition of data on tender winners for many BDNCP records that currently lack this information. The project will also pave the way for a broader use of textual databases connected to public procurement.


A web crawler will first be developed to download and index all tender announcements and their outcomes published in the Gazzetta Ufficiale. Then, using CIG and other types of keys, the DBNCP records to be published in the Gazzetta Ufficiale will be matched against the Gazzetta Ufficiale announcements in order to evaluate the compliance with the duty of publication. In the second phase of the project, the DBNCP records lacking outcome values will be paired with the corresponding tender outcomes published on Gazzetta Ufficiale and the most important information (i.e. the winner, date, amount) will be extracted from the relevant Gazzetta Ufficiale documents using advanced data mining techniques. The accuracy of the information extracted using this process will be evaluated experimentally.

Belongs to