About us

Machine Learning Research is Allegro’s R&D lab created to develop and apply state-of-the-art machine learning methods, helping Allegro grow and innovate with artificial intelligence. Beyond bringing AI to production, we are committed to advance the understanding of machine learning through open collaboration with the scientific community.

Teams

CX Robots

We focus on using NLP models to understand and automate the communication at Allegro, e.g. automatically answer questions asked to our customer support. Our main research directions are related to pretraining and evaluating large language models, semi-supervised clustering, and human-in-the-loop NLP.

Learning to Rank

In Learning to Rank, our goal is to develop ranking models which find the optimal ordering of items for a given search results list, based on past users’ interactions. Such models constitute the final stage of Allegro’s search engine, serving millions of searches a day.

Some of the research problems we tackle are:

  1. Incorporation of multimodal data (textual, visual, tabular) into an end-to-end ranking model.
  2. Search engine personalization
  3. Developing novel ranking architectures and loss functions

Visual Search

In Visual Search, we create machine learning models which enable us to create image embeddings suitable for similarity search. The main challenge is to make these embeddings sensitive to relevant visual traits of products like category, style, colour, pattern etc. while maintaining insensitivity to irrelevant information such as background, presence of a model, different camera angles etc.

PCS Automation

We employ a diverse set of machine learning techniques to improve the product-based experience on Allegro. Problems that we solve include, e.g., product matching i.e., being able to infer the product being sold for a merchant-created offer, or automatic integration of product definitions from external product catalogs. Examples of our research directions include sampling methods in similarity learning or extreme classification methods.

Recommendations

The main purpose of our team is to address users’ needs, show them a broad range of products they would be interested in - thus serving as an inspiration and connecting them with useful, contextual offers. We ground our algorithms on previous collective behaviors of our user-base. But we also work towards incorporating content features of the items into the models. Our main challenges include building novel algorithms that can give good recommendations for our users and also operate at scale. Both being a significant endeavour, considering the sheer amount of traffic Allegro serves daily.

We focus our research on:

  1. Building item representation, that can serve as retrieval basis,
  2. Improving ways to detect user intents in clear (and useful) way,
  3. Current trends in recommender systems.

Reinforcement Learning

We aim to enhance various Allegro projects with exploratory algorithms, which are capable to not only exploit historical data but explore via interactions with the world. Currently, we are working on the optimization of Search Engine Marketing (SEM) and Content Optimization projects. Our main research directions include contextual bandits, A/B testing alternatives with casual impact discovery and offline RL.

Talks

Open-Source

allRank

framework for training neural Learning-to-Rank (LTR) models, featuring implementations of:

  • common pointwise, pairwise and listwise loss function,
  • fully connected and Transformer-like scoring function,
  • commonly used evaluation metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR},
  • click-models for experiments on simulated click-through data

Try it!

KLEJ Benchmark

The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language understanding. Key benchmark features:

  • It contains a diverse set of tasks from different domains and with different objectives,
  • Most tasks are created from existing datasets but we also release the new sentiment analysis dataset from an e-commerce domain.

Try it!

HerBERT

HerBERT is a BERT-based language model trained on six different corpora for Polish language understanding. It achieves state-of-the-art results on multiple downstream tasks, including KLEJ Benchmark and Part-of-Speech tagging. We release both Base and Large variants of the model as a part of transformers library for anyone to use.

Try it!

Publications

2021

Subgoal Search For Complex Reasoning Tasks

Authors: Konrad Czechowski, Tomasz Odrzygóźdź, Marek Zbysiński, Michał Zawalski, Krzysztof Olejnik, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

Accepted at: Conference and Workshop on Neural Information Processing Systems (NeurIPS)

Read

2021

HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish

Authors: Robert Mroczkowski, Piotr Rybak, Alina Wróblewska, Ireneusz Gawlik

Accepted at: BSNLP, accepted long paper

Read

2020

KLEJ: Comprehensive Benchmark for Polish Language Understanding

Authors: Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik

Accepted at: ACL 2020, accepted long paper

Read

2020

Context-Aware Learning to Rank with Self-Attention

Authors: Przemysław Pobrotyn, Tomasz Bartczak, Mikołaj Synowiec, Radosław Białobrzeski, Jarosław Bojar

Accepted at: SIGIR eCommerce Workshop 2020, contributed talk

Read

2020

NeuralNDCG: Direct Optimisation of a Ranking Metric via Differentiable Relaxation of Sorting

Authors: Przemysław Pobrotyn, Radosław Białobrzeski

Accepted at: The 2021 SIGIR Workshop On eCommerce (SIGIR eCom ’21)

Read

2020

BERT-based similarity learning for product matching

Authors: Janusz Tracz, Piotr Wójcik, Kalina Jasinska-Kobus, Riccardo Belluzzo, Robert Mroczkowski, Ireneusz Gawlik

Accepted at: EComNLP 2020 COLING Workshop on Natural Language Processing in E-Commerce

Read

Job offers

Research Engineer - Machine Learning (Reinforcement Learning)

Warszawa, Kraków, Poznań, Toruń, Wrocław, Gdańsk, Katowice, Łódź, Lublin

Apply

Research Engineer - Machine Learning (Ranking and Recommendations)

Warszawa, Poznań, Kraków, Toruń, Wrocław, Gdańsk, Katowice, Łódź, Lublin

Apply

Research Engineer - Machine Learning (Reinforcement Learning)

Warszawa, Kraków, Poznań, Toruń, Wrocław, Gdańsk, Katowice, Łódź, Lublin

Apply

Research Engineer - Machine Learning (Ranking and Recommendations)

Warszawa, Poznań, Kraków, Toruń, Wrocław, Gdańsk, Katowice, Łódź, Lublin

Apply
See more job offers
doubleclickfb