Open and free web analytics, a new approach

05.04.2019

As many other people who run their own website I came to a point where I wanted to get some insights into how other people interact with my blog and what content is most viewed and interesting.

The most obvious solution that many people will stumble upon is Google Analytics.
The biggest benefit here is that it is free up to a certain point and provides a nice dashboard. In addition it is backed by a huge company.
Unfortunately though the platforms code and what google does with the data outside of presenting it to you is not clear, and with their business practices it is probably used to target people across the web.

Given the latest developments in data privacy such as GDPR this leads to legal and ethical questions.

The alternatives

Multiple other people and business have had the same concerns and a variety of alternatives emerged, such as Open Web Analytics, Matomo and Fathom.
All of them are Open source, thus allowing you to review what is being done to the data and giving you the ability to host them on your own hardware to gain ownership of your data.

This is great for technical users with the right abilities, but leaves a lot website owners stranded, unless they are willing to pay, thus making Google Analytics the obvious choice for many.

A new solution

To have a real alternative I think that a service is needed which satisfies the following constraints:

free to use for the average user
open source
open data (all data should be accessible to anyone)
respect users privacy

Taking these guidelines I spent the past weeks building on a side project which I call Freelytics.
The project is still in an early status, but already usable to get some very basic information about your website, namely how many times a page has been visited.
All code is available on Github, but I want to give a quick overview of its architecture.

Freelytics architecture

Simple architecture diagram of freelytics

The architecture itself is not too complex as there are 4 main components.

1: Tracking Script

The tracking script is responsible for actually collecting the relevant data and then pass it on for storage.
It is written in pure Javascript to be able to write it as lightweight as possible, making sure not to put bloat on a users website.

2: Dashboard

The Dashboard is the most important part for users. All data for their domain can be displayed here.
The data is accessible to anyone who knows the domain.
As far as the technologies go, this is probably the most complex part.

This is because the projects success will depend on knowledge from a lot of different people from the community to be a viable alternative.
So I chose an approach that allows for maximum flexibility in chosen technologies while respecting best practices.
Currently it uses HTML for Semantics, SASS for styling, JavaScript for interaction Logic, elm for visualizing data and NGINX for routing.

The build system allows for more flexibility here and suggestions are welcome. Just note that I have already weighed against building an SPA, to make it easy for anyone to join in.

3: Database

Responsible for storing and doing analysis on the data, such as aggregating data points over a period of time.
For this reason Postgres was used, as it comes with a powerful query language (SQL) that most people who work with data will be familiar with.
The goal is to provide all the data stored here to people who would like to do research on it. Given security and load concerns I have not yet decided, whether it should be open to all or on request.

4: The API

The glue that holds everything together. Written in Golang due to ease of access for new developers.

Again, you can check out all the details on Github. I have tried to use docker to create isolated development enviroments, making it easy for anyone to get started with development.

So how to make money with this?

Making money is not a goal of this project! It is about creating a better internet.
The system is currently running on a VPS (~60$/yr) and has a custom domain (~15$/yr).
Until more users feel like this is a tool they use this should run pretty okay and I can comfortably pay this out of my pocket.

Will it scale?

Most definitely. Given the distributed nature of the system, all parts can be easily scaled.
The only bottleneck could be the database. Given the data model ( indexing everything under the URL that data is saved for ) sharding should theoretically be easily achievable.

Cool, what now?

My main focus right now is on providing documentation.
At the same time I will try to figure out additional data points to collect, which require neither cookies nor violate GDPR rules.
Of course I hope that others will feel that this project is worth working on and feel like helping out. Any expertise is welcome! And if you are not familiar with the tech yet, but want to learn something new I will be happy to assist you on the journey.

For this I will start adding issues on the Github repository that need help, but any idea is welcome.

But of course you should give it a try first.
It is available at https://freelytics.net/. Simply generate a tracking script at https://freelytics.net/generate-tracking-script and include it into your website or check out the stats for my blog by entering ehlers.berlin on the dashboard (Hint: I am not an influencer).

Back to overview