I don’t think an open-source project really needs usage analytics, because users can already get in contact with you and mention what they like/dislike about your product and what changes they expect.
Developing an early-stage open source project is like walking in the dark: you’re probably the only user giving honest feedback about pain points and why the project sucks. With SaaS startups, you can slap Google Analytics on top and get a hint on where your typical user gets freaked out and leaves.
We are developing a service called Metarank, an open-source tool to do search-category-listing-recommendation personalization. It’s a self-hosted backend service with no UI and no SaaS version, and the feedback we usually observe looks like this:
What were other 990 Metarank users doing? Are they happy users and have no issues? Or maybe they spotted a bug and never ever attempted to use the service again? We have no idea and too many questions:
- Which geographical market should we focus on? We’re EU based, so should we go for other EU companies?
- Are people using rare features? Which part of the service should we polish the most?
- What is the last action made before a user throws Metarank out of the window?
A common way to understand what these shadow 990 users are doing is usage tracking. An app periodically phones home and reports how things are going, with different levels of granularity and privacy intrusiveness:
- Grafana: anonymous stats on used features, opt-out.
- Gitlab: non-anonymous stats, opt-out.
- Kong: anonymous error reports, opt-out.
- gcloud cli: anonymous usage data, opt-in.
- warp: non-anonymous, opt-out.
Community projects usually don’t care much about feedback; large enterprises already know what users do and want, but small companies have no other way to harvest feedback.
How often do you personally provide feedback to open source project maintainers?
– Why does X send a 1Mb payload to a Chinese IP address every minute?
– Oh, it’s just a version update check!
(A HN wisdom)
There are other shady options to implement tracking, but without clearly stating that tracking exists:
- Version update check on each startup: the same tracking, but with a different official name and reason.
- Sentry-style stack trace and exception collection in case of errors and warnings. With a sufficient number of triggered warnings across the code (which are only logged to Sentry, and not stdout), it can provide the same amount of analytical information as regular tracking.
While sneaky tracking methods do work, be prepared to get uncomfortable questions from your users.
In Metarank, we’re trying to be as nice as possible with this delicate question:
- We track both usage analytics and error reporting, and it can be separately toggled in a config file and has a separate chapter in docs:
- The analytical payload has no IP and personal information:
- The API accepting payloads is open-source and does not store IP addresses.