- Representing data with Knowledge Graphs
- Running example: countries in DBpedia
- Creating entity embeddings with RDF2Vec
- Introducing pyRDF2Vec
- Biased walking or sampling strategies
- Walk modifications and transformations
- Loading KGs with pyRDF2Vec
- Creating our first embeddings
- Tuning the hyper-parameters
- Trying different walking strategies
- Sampling Deeper Walks
- Shortcomings of RDF2Vec and research challenges
- Code and data availability

**Graphs** are data structures that are useful to represent ubiquitous phenomena, such as social networks, chemical molecules and recommendation systems. One of their strengths lies in the fact that they explicitly model relations (i.e. **edges**) between individual units (i.e. **…**

The current COVID-19 pandemic requires an effective vaccine in order to win the fight against this virus. Researchers from Stanford University are studying mRNA vaccines as these could quickly provide a candidate solution. One challenge to these types of vaccines is their stability: they tend to spontaneously degrade unless kept under intense refrigeration. In order to find a stable mRNA sequence, the people from Stanford University reached out to the Kaggle community in the OpenVaccine: COVID-19 mRNA Vaccine Degradation Prediction competition. The competition launched on 11 September and lasted for only 26 days, which is unusually short for Kaggle due…

**EDIT 04/06/2020: We have now released our code!**

On the 24th of February, Kaggle released a new “Research” competition, with a prize pool of $25,000, in collaboration with the University of Liverpool. In this competition, we were provided with electrical signals corresponding to ion channel data, and our goal was to create an algorithm that could automatically identify the number of channels that were open at each time point. Initially, the competition launched with an exotic “Quadratic Weighted Cohen Kappa Score (QWK)”, but this resulted in near-perfect results a few moments after release, which made me reluctant to join. After…

Graphs are data structures that are useful to represent ubiquitous phenomena, such as social networks, chemical molecules and recommendation systems. One of their strengths lies in the fact that they explicitly model relations (i.e. edges) between individual units (i.e. nodes), which adds an extra dimension to the data. We can illustrate this enrichment of data using the Cora citation network. This is a dataset contains a bag-of-words representation for a few hundred papers and the citation relations between each of these papers. If we apply dimensionality reduction (t-SNE) to create a 2D plot of the bag-of-words representations, we can see…

Every year, Google organizes a programming competition called Hash Code. The goal is to solve an optimization problem in 4 hours in a team of 2 to 4 people. The competition attracts tens of thousands of competitors from all over the world. This was the first year I heard of it, so I decided to participate along with Bram Steenwinckel and Pieter De Cremer.

The goal of this year was to obtain the highest score by scanning different books that were distributed over different libraries in a limited number of days. Copies of the same book could be available in…

Many real-world processes produce data over time, giving rise to temporal data or timeseries. As opposed to tabular data, neighboring observations (i.e. observations that are close in time) are highly correlated, requiring special effort when analyzing timeseries. One possible task that can be performed on timeseries is the classification of them. Example use cases include surface detection based on accelerometer data, classifying the type of electrical device based on electricity usage or classifying a leaf’s type based on contour information.

Shapelets are small subseries, or parts of the timeseries, that are informative or discriminative for a certain class. They can…

This year, Bram Steenwinckel and I ensured our Christmas gifts by being the 28th team (of more than 1600 participants) to deliver Santa an optimal schedule for his annual 100-day workshop. This blog post contains a problem description and the different steps we took to achieve this optimal solution.

**DISCLAIMER: **Bram and I are, by no means, experts in optimization. Nevertheless, we had a lot of fun and learned a lot.

Each year, Santa organizes a Christmas workshop spanning 100 days, allowing 5000 families to attend it. Each of the families (consisting of a certain number of family members) can…

Last year and this year, I wrote a blog post about my solution for an AI competition which colleagues (Elias, Pieter, Ozan and Cedric) and I created, specifically for first-year engineering students. In this competition, players can write their own agent that plays a game for them against agents of other players. To facilitate this competition, a platform was created that allowed the students to register themselves, upload their code and check the ranking of their agent on a leaderboard. The leaderboard is constantly updated based on results of games, simulated on a periodic basis. …

A new academic year, which means a new batch of bright students taking their first steps in their engineering career. As last year, three of my fellow course assistants (Elias, Pieter and Ozan) and I created a platform for an AI bot competition, to be hosted during the course of Informatics (in which they learn to program in Python), given by prof. Dhoedt at Ghent University. This year, we decided to host the game of Tron (sometimes called continuous snake or Lightriders), as shown below.

Since both Riddles.io and CodinGame (which you should definitely check out if you are interested…

In the context of the ‘Informatics’ course, where the first-year engineers at the University of Ghent learn to code in Python, we set up an AI bot competition platform. The goal was to create a bot that plays the game connect-four by implementing the following function:

def generate_move(board, player, saved_state):

"""Contains all code required to generate a move,

given a current game state (board & player) Args: board (2D np.array): game board (element is 0, 1 or 2)

player (int): your plabyer number (float)

saved_state (object): returned value from previous call Returns:…

Data Scientist || Kaggle Master