Skip to content

Massive Scale Data Integration on Graphcore Processing Units

Advancements in machine learning techniques for handling graph-structured data were celebrated at NeurIPS 2022, with the OGB-LSC Knowledge Graph competition being won. The significance of these methods continues to rise, yet a key hurdle for researchers is scaling these models to handle...

Comprehensive Data Network Extension on Graphcore Intelligence Processing Units
Comprehensive Data Network Extension on Graphcore Intelligence Processing Units

Massive Scale Data Integration on Graphcore Processing Units

================================================================================================

In the realm of Knowledge Graph Completion (KGC), a novel strategy called Balanced Entity Sampling and Sharing (BESS) is making waves. This technique, designed to enhance training effectiveness for entities that are rare or have sparse connections in the graph, is particularly useful in Temporal Knowledge Graph (TKG) completion models.

BESS operates within an incremental training framework, supporting learning about entities not observed during initial training or those with few connections. It integrates global similarity measures and a weighted sampling strategy to improve existing KGC methods, focusing more on underrepresented or newly introduced entities.

Originating from recent research, such as the 2022 arXiv paper "Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs," BESS is understood as a technique to balance the sampling process, enabling the model to learn better from both common and long-tail entities. This addresses typical KGC challenges like sparse data and evolving graphs.

In the context of Knowledge Graphs, these are structures that capture the relationship between real-world entities, represented as triples (head, relation, tail). The NeurIPS 2022 Competition Track Programme includes the Open Graph Benchmark Large-Scale Challenge (OGB-LSC), encouraging researchers to work with realistically sized datasets and develop solutions for real-world needs.

In a striking demonstration of BESS's potential, Graphcore recently submitted a winning entry to the Knowledge Graph track of OGB-LSC@NeurIPS 2022. The ensemble of 85 Knowledge Graph Embedding (KGE) models, consisting of 25 best TransE, DistMult, and ComplEx models, and 5 best TransH and RotatE models, achieved a validation Mean Reciprocal Rank (MRR) of 0.2922 and an MRR of 0.2562 on the test-challenge dataset.

The winning submission did not use relation features in the models, and at inference time, the complete set of entities is traversed for a given query, and an ordered list of the top-K results is returned. The BESS approach guarantees that only tail embeddings have to be exchanged across workers.

The Graphcore Bow Pod16 benefits from collective communications running over fast IPU-IPU links, making a separate parameter server obsolete. BESS replicates relation embeddings across all workers and updates them using an AllGather operation.

The goal of OGB-LSC is to push the boundaries of graph representation learning, and the BESS approach, with its focus on improving entity prediction in knowledge graphs with imbalanced data distributions, is a significant step towards achieving this goal.

While some models may perform exceptionally well in ensembles, such as DistMult and ComplEx, the individual validation MRRs of some models were lower than others. However, they still performed well in the ensemble, demonstrating the power of the BESS approach.

The paper "BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion" by Cattaneo et al. (2022) was published as a preprint on arXiv, detailing the exact workings of this innovative strategy. As BESS continues to gain traction, we can expect to see it play a significant role in the future of KGC.

[1] Cattaneo, M., et al. (2022). BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion. arXiv preprint arXiv:2206.08940.

Data-and-cloud-computing technologies play a crucial role in enabling the development and deployment of techniques like Balanced Entity Sampling and Sharing (BESS), which is a novel strategy in the field of Knowledge Graph Completion (KGC). This technology aids in the training and execution of BESS, particularly in its incremental framework and data exchange operations between workers.

The effectiveness of BESS in learning and predicting for both common and long-tail entities underscores the importance of technology in tackling the challenges of sparse data and evolving graphs in KGC, thereby contributing significantly to the future of this field.

Read also:

    Latest