Welcome, I'm

Jakob Salomonsson

Machine learning engineer / data scientist. I enjoy bringing ideas to life and feel excited when working on projects that benefit the greater good. And if there's some novelty to it, then so much better. Originally from Sweden, currently residing in sunny Madrid, Spain.

The purpose of this website is mainly to offer a showcase for some of the projects I've worked on so far. Feel free to take a scroll!

And oh yeah, I normally tune into this playlist while working.

Some of my work


How to Fine-Tune an NLP Transformer Model on a task of your Choice

2023

There’s been a lot of buzz around Natural Language Processing, or NLP, the last few years after important technological advances that has allowed more performant models even in situations with limited access to data. This literally exploded in November 2022 when OpenAI’s ChatGPT was launched. As a result, I‘d like to take the opportunity to show how you can fine-tune a pre-trained model on a task of your choice on your own.

Additionally, I will leverage both structured and unstructured data by processing both of it in the same model architecture. This can yield important performance improvements on some tasks.

Predicting the Fare on a Billion Taxi Trips with BigQuery

2022

How long time does it take and how much does it cost to analyse and train a model on a billion taxi trips in the cloud? When do most trips occur, what affects the total fare amount the most and how accurately can we predict it?

Here, I'm using GIS data to try to respond to those questions and many more.

Spotify Streams in Europe & North America 2017-2021

2022

Interactive visualisation for country-by-country streams on Spotify in Europe and North America over the years 2017-2021.

The major streaming countries, in absolute numbers, are the United States, United Kingdom and Mexico, while the Nordic countries have more than twice the streams per capita than most other countries. December seem to be an extra intensive streaming period while the number of streams in the U.S. have declined since 2018.

Go full-screen for best experience.

What makes a Song Popular on Spotify?

2022

Let's learn more about what makes a song popular on Spotify. My intuition is that shorter, vocal and more energetic songs are more popular among the general population. These are the kind of songs you hear most frequently when tuning into the radio or checking the top tracks of the month.

Is it possible to prove this statistically? Moreover, is your music taste more in line with the general public's, or you've developed a unique taste of your own?

Training a Model on 100 Million Ratings with Spark on a Mac M1

2022

Is it feasible to train a model on 100 million ratings using nothing more than a common laptop? Let's find out.

I've been refreshing Spark lately and wanted to give it a try on a dataset I wouldn't be able to deal with using, otherwise common data scientist tools, such as Pandas and Numpy. I came across the dataset and the Netflix challenge it originates from some time ago, but never really had any reason to work on it. Until now.

This mini project shows that it's possible to work with such large datasets on your standard equipment at home without having to use cloud services. The final model was trained in less time than what it takes to join an outdoor exercise session.

I should mention that you can speed things up significantly using the cloud though.

Engineer & Founder of a Startup in Stealth Mode

2021 - present

We believe it's better to learn how to think rather than what to think, and if we can improve the learning output from our education system by only a small fraction, the implications that will have on our society would be staggering.

Predicting Tyre Tread Depth on a Small Dataset using Deep Learning

2021

Maintaining proper wheel quality is incredibly important for good road safety. For organisations owning larger fleets of vehicles, taking them off the road for maintenance can be costly. Being able to better plan when maintenance is needed, and on what parts of the vehicle through smarter predictions can significantly reduce expenses.

In this project we collected images of tyres (~300 in total) to test the hypothesis on whether it's possible to predict the tyre tread depth on single camera input images using deep learning techniques. Despite the small dataset, we chose a design that can easily be scaled up to millions of samples with virtually no changes to the code.

Although the dataset is very small, this early-stage experiment seems to indicate that it is possible to predict the tread depth of tyres using deep learning techniques. More data should however be collected to verify the results with greater confidence. Another, plausible more important aspect, is whether it's easier to take a photo of sufficient quality of a tyre rather than just using current methods.

Detecting Melanoma Skin Cancer with Computer Vision

2021

There were around 300k newly diagnosed melanoma and one million non-melanoma skin cancer cases worldwide in 2018. The numbers are estimated to increase in the coming years. My home country Sweden, is ranked 6th in terms of cancer rates per capita, averaged over all sexes. The ability to quickly and accurately check a spot on the skin has the potential to benefit millions of people.

We will test the idea whether it's possible to distinguish melanoma from non-melanoma skin cancer using deep learning techniques. To save on computational costs, a small dataset is chosen (~3k images), but the current design will be able to process larger datasets with only minor changes to the code.

Cancer Detection with Abnormal Chromosome Levels using Machine Learning

2020

This work was similar to the multi-analyte blood test project, but focused on improving sensitivity by evaluating abnormalities in the blood. Sensitivity was indeed improved to 98.1%, significantly higher than before, while maintaining specificity at 99%.

It turns out that by simply using a more performant, but very common, algorithm the results were improved by a large margin. The original research team must simply not have considered it for their publication.

Cancer Detection Through Multi-Analyte Blood Test

2020

Thrive Earlier Detection launched in early 2019 with $110M of Series A funding. The project piqued my interest as I thought I could provide an improved cancer detection model with their published proteomics and genomics data.

My model improved sensitivity for breast, colorectal and pancreas cancers by 100%, 31% and 21%, respectively, while maintaining specificity at >99%. These three cancer types account for more than four million new cases annually.

I gained deeper understanding of the entire Machine Learning workflow as well as insights into product/market fit as a result of the project.

Although it wasn't a direct result of my work, the company was later acquired for north of $2 billion.

Finding Habitable Exoplanets with Deep Learning

2018

I developed a complete pre-processing pipeline and a model architecture in order to apply deep learning to analyze light emitted from distant stars. The data was transformed into large spectra (i.e. images) and fed into convolutional neural networks to predict three target variables: temperature, gravity and metallicity. Total size of the dataset was close to 100GB after pre-processing.

The long-term goal of this project is to find habitable exoplanets and this new method decreased the error on earlier machine learning approaches by more than thirty-five times.

Predicting Stock Prices

2018

My motives for this project were more selfish; I hoped that an automated stock-trading system for the Stockholm Stock Exchange might help me pay off student loans.

The project was tougher than expected. Still, I learned a lot about the application of deep learning to time series data, as well as real-time data collection techniques.

Let's Connect


Are you interested in collaborating on an idea, need help with data science or simply want to extend your network?

Feel free to reach out. I'm always on the lookout for inspiring people to connect with. I welcome bribes in the form of ice cold bubble teas.