ANAC – Text mining on BDNCP and Gazzetta Ufficiale

A study of matching BDNCP and Gazzetta Ufficiale to monitor data quality and public procurement publications.

Through advanced data mining techniques, the project will match the BDNCP data against the tender notices published on Gazzetta Ufficiale to improve data quality of public procurements and transparency of tenders.


  1. Improving quality and completeness of BDNCP records, by acquiring missing data concerning tender outcomes from the corresponding Gazzetta Ufficiale notices.
  2. Improving transparency of tenders by evaluating the compliance of Gazzetta Ufficiale with publication duty


The project will benefit both public procurement transparency and monitoring, because it will ensure that the invitations to tender have been published on Gazzetta Ufficiale and will enable the acquisition of data about tender winners for many BDNCP records that at the moment do not contain this information. Also, the project will pave the way for a deeper utilization of textual databases involved with public procurement.


A web crawler will be first developed to download and index all tender notices and outcomes published on Gazzetta Ufficiale. Then, using CIG and other types of keys, the DBNCP records that are to be published on Gazzetta Ufficiale will be matched against the Gazzetta Ufficiale notices to evaluate the compliance with the publication duty. In the second phase of the project, the DBNCP records with missing outcome values will be paired to the corresponding tender outcomes published on Gazzetta Ufficiale and the most important outcome information (i.e., winner, date, amount) will be extracted from the relevant Gazzetta Ufficiale documents with advanced data mining techniques. The accuracy of the information extracted using this process will be evaluated experimentally.

Belongs to