In the evolving world of artificial intelligence (AI) and machine learning (ML), the linchpin for success lies in the dependability of the underlying data. As AI spreads in companies, making sure the data it uses is reliable becomes utmost important.
Consider the analogy of data and data sets as the lifeblood of an AI machine. Without access to high-quality data, AI devices struggle to effectively learn and execute their designated functions. Challenges such as primary key inconsistencies and data duplication underscore the fact that ensuring data quality is never a guaranteed outcome.
Recent discussions such as the one hosted by Semantic Arts during the November 2023 Estes Park Group meeting shed light on the pitfalls that can lead to untrusted software data. Blair Kjenner, the founder of information management firm Method1 Enterprise Software, pointed to inconsistencies in primary key methods and data models contributing to integration challenges, resulting in duplicated system-level functions and data siloing.
Companies all around the world are trying different things to make sure their data can be trusted. Noteworthy approaches include the introduction of the Zero-Copy Integration standard by Canada’s Data Collaboration Alliance. This approach, ratified by the Standards Council of Canada in February 2023, advocates for access-based data collaboration to eliminate data duplication.
Another way is to use special web addresses (URIs) and Internationalized Resource Identifiers to fix problems with how data is organized. Making these unique and using technologies like InterPlanetary File System (IPFS) can help companies make sure their data is organized better and avoids problems like broken links.
Additionally, there is a focus on disambiguation with knowledge graphs. Platforms like OriginTrail combine web semantics, cryptography and decentralized graph networking. Knowledge graphs, blending symbolic logic with explicit relationship data, disambiguate information, providing deterministic facts to complement statistical machine learning approaches.