How Algorithms Decide What You See: The Maths Behind Recommendations

How Algorithms Decide What You See: The Maths Behind Recommendations
Image via Adobe Stock

Introduction

Imagine this: You open up YouTube and you start watching a video. Within seconds, your algorithm begins adjusting what it will show you next. Watch a couple of minutes of that YouTube video and you're bored, so you open TikTok and scroll for 10 minutes. After every scroll, each video starts to feel oddly more relatable. It can feel as if these platforms know you personally. However, in reality, what feels natural and intuitive is a result of mathematics and statistical modelling working in the background.

At its core, every recommendation system attempts to solve one problem: "What should this user see next?" Platforms such as YouTube and Netflix heavily rely on mathematical optimisation — a field of mathematics that focuses on maximising or minimising specific outcomes. In the case of YouTube and Netflix, the objective is typically focused on maximising watch time, engagement, and long-term retention. Due to all of their content being long-form, accurately predicting what the user might want to see next is especially crucial — having poor recommendations doesn't simply lead to a skipped short, but to the user leaving the platform entirely.

From Human Actions to Data

In order to make these predictions, all platforms need to translate small human actions into data. For instance, every time you pause a video, replay a certain section, or click off a video, these all become measurable pieces of data that an algorithm can use. However, not all actions are treated equally. Finishing a video carries more weight than simply hovering over it. Each of these different signals then adjusts and calibrates to what you are most likely to watch next.

Vectors and Dimensional Space

So how do these platforms convert this data mathematically? Vectors. To put it simply, a vector can be seen as an array of numbers that describe certain characteristics. In recommendation systems, both the person consuming the content and the content being consumed can be represented as vectors in a multi-dimensional space. Each dimension corresponds to a specific feature such as genre, watch time, and interaction frequency. Although this space may contain thousands of different dimensions, the mathematics allows the system to accurately calculate the relationships between them by measuring how close or distant these vectors are from one another. The closer your vector is to a video's vector, the more likely it is to be recommended to you.

Measuring Closeness: Euclidean Distance

So how exactly do we calculate "closeness" between vectors? In high-dimensional space, distance between vectors can be measured using mathematical formulae such as cosine similarity or Euclidean distance. These methods allow the geometry of vectors to demonstrate how aligned a set of characteristics are. If your viewing behaviour strongly aligns with the features of a particular video, the distance between vectors shrinks — and vice versa.

One of the simplest methods for calculating distance is Euclidean distance. In this method, each vector is treated as a single point in dimensional space, and the distance between them is then calculated. The formula is derived directly from Pythagoras:

In a 2D plane:

$$d = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2}$$

This works by calculating both the horizontal and vertical difference, then using a² + b² = c² — where a and b are the horizontal and vertical differences — to work out the hypotenuse, which is d. You then square root d² to get the distance.

In n-dimensions:

$$d = \sqrt{\sum_{i=1}^{n} (a_i - b_i)^2}$$

Here, a = (a₁, a₂, a₃, … aₙ) and b = (b₁, b₂, b₃, … bₙ), and each coordinate represents a feature. For each feature i, find the difference, square it, sum across all dimensions, and then take the square root. This equation is essentially the same as the previous one, just extended into n dimensions.

So which platforms actually use Euclidean distance? In practice, very few. Euclidean distance is poorly suited for recommender systems because users have widely different consumption behaviour. One user might have 10 hours of screen time while another has 10 minutes. Euclidean distance is very sensitive to magnitude, meaning that if someone consumes far more content overall, their vectors will have larger values and the calculated distance will increase — even if their viewing patterns are actually similar.

Cosine Similarity: The Better Approach

If Euclidean distance has these limitations, what do platforms actually use? Cosine similarity. Rather than measuring straight-line distance, cosine similarity measures the angle between two different vectors. This allows systems to focus on proportion rather than raw magnitude.

The benefit of cosine similarity is that it can handle vastly different amounts of consumed content — if two users' behavioural patterns point in similar directions, cosine similarity will identify them as aligned. This makes it far more effective when working with behavioural data.

Matrix Factorisation and Learned Embeddings

However, even cosine similarity is often only part of the picture. Large-scale modern platforms don't rely solely on simple geometric comparisons. They also use more advanced techniques such as matrix factorisation and learned embeddings. These methods take large amounts of human interaction data and compress it into a lower-dimensional representation that captures hidden preference patterns. Rather than labelling specific features such as genre and watch time, systems learn abstract factors directly from behavioural patterns.

For example, if many people who enjoy long science documentaries also happen to love shorter chess videos, the model identifies this correlation without ever being told those categories are related. Through matrix factorisation, each user and piece of content becomes a latent vector — a numerical representation of these hidden factors. Instead of geometry, recommendation becomes a scoring problem. The system computes the dot product between a user's vector and a content's vector. If the result is high, it predicts strong engagement between that user and that content.

Conclusion

Ultimately, what feels natural is just ongoing optimisation. Everything you do is tracked, and your position in a high-dimensional landscape is subtly shifted. Platforms calculate you — and through geometry, vectors, and similarity, they determine what you see next.


Bibliography

Aggarwal, Charu C. Recommender Systems: The Textbook. Springer, 2016.

Koren, Yehuda, Robert Bell, and Chris Volinsky. "Matrix Factorization Techniques for Recommender Systems." Computer, vol. 42, no. 8, 2009, pp. 30–37.

Ricci, Francesco, Lior Rokach, and Bracha Shapira. Recommender Systems Handbook. Springer, 2015.

Google Developers. "Recommendation Systems Overview." Google Machine Learning Crash Course.

Amatriain, Xavier, and Justin Basilico. "Netflix Recommendations: Beyond the 5 Stars." Netflix Tech Blog, Netflix, 2012.

Linden, Greg, Brent Smith, and Jeremy York. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering." IEEE Internet Computing, 2003.

Read more