This post discusses the high level tech stack decisions for our first product Segments by Tresl, and how we got there.
Tresl uses a hybrid approach of consulting and product to connect businesses with data science. Consulting gives us a direct channel to customers; we get to solve real problems from real businesses. Product on the other hand takes consistent uses cases and scales the solutions so thousands of businesses can take advantage of what we’ve learned.
Our first product is based on our experience working with e-commerce and more generally online businesses (eg. SaaS). Businesses interact with their users in disparate systems like their blog, product signups, email newsletters, paid marketing channels, and so forth. It’s a lot of context to keep track of from a lot of different systems. Our product Segments connects these disparate systems together to create a complete view of their users, and automatically segments them to provide personalized messaging for each user, thereby delivering more value. If you’re interested in working with us during the beta, let us know!
On the implementation side, we used four main criteria for deciding on our tech stack:
- Familiarity of language
- Compatibility with data science stack
- Speed of iteration
- Durability until next phase
Python (3) (Base language) As a bootstrapped startup, our runway is limited so it isn’t the time to learn the hottest new Javascript stack. We regularly use and used Python for our client work and past work, so Python was the natural answer as the language basis of many of our projects. Python libraries have also developed to the point where Python 3(.6) was more recommended than not wherever we looked.
Most of our projects use Python for tasks including web scraping, entity matching, data enrichment, and modeling. While language interoperability is feasible via APIs or wrappers, the easiest solution is to just stay within the same programming space. Fortunately the Python space is vast and the libraries mature.
Flask (Web framework) We want to stay as lightweight as possible, especially for these initial phases, so we chose Flask over a more full-featured framework like Django. Our product is guaranteed to change immensely and maybe even completely rebuilt from one phase to the next. Fewer features with less overhead is therefore preferable to more features and consequent maintenance. The Flask ecosystem has tons of add-ons we can bolt on depending on what features are needed.
PostgreSQL (Database) Segments collates a business’s user touchpoints and events throughout their journey with the business. It’s a lot of data, but we don’t expect it to be so much data that we need something other than a “plain ol’” relational database. We use SQLAlchemy with defined schemas and database migrations to help interface with our PostgreSQL database.
jQuery and CSS (Frontend) We avoided using frontend frameworks like React, Vue, or Angular because they are overkill for this phase of Segments (and because they would take time to learn). A simple CSS grid framework (like Bulma) and jQuery are enough to do exactly what we need without spending time maintaining a more feature-rich framework, with the trade off of more technical debt and possible future extensibility. If Segments makes it to a next phase of market fit, then we expect a lot of these pieces to be refactored or migrated.
Nginx and Gunicorn (Deployment) Remember we are building the first iterations for validating market fit, not a scalable continuous-integration masterpiece. A regular Linux server from Digital Ocean, with Nginx as web server reverse proxy, to a Gunicorn-managed deployment is fast, easy, and scalable enough for our expected amount of traffic. Monitoring apps like WebGazer or Hyperping are enough to subsequently check our deployments and alert us of any unexpected downtime.
Google Analytics and Search Console (Tracking) Finally, we would be remiss as a data science company if we didn’t mention a tracking solution. We use Google Analytics and Search Console as a way to easily track onsite behavior and organic discovery, respectively. We don’t need any custom tracking solutions for the time being, but expect to have KPI monitoring and reports as soon as we hit any level of market fit.
And that’s our stack — each layer chosen to allow for fast iterations that can get us to the next phase of market fit, where we will likely have a different set of criteria depending on the level of adoption. Questions or comments welcome!
Update: In late 2018, we switched our stack to a "serverless" architecture similar to that detailed in this post.