Skip to content
> andrew_dryga
Projects
Blitz
Single-handedly owned ~20 high-traffic backends end-to-end (77 Elixir umbrella apps, 70+ Redis/50+ PostgreSQL instances) serving ~25k RPS with peaks to 120k for a 7-figure DAU product.
Firezone
WireGuard-based replacement for legacy VPNs. Re-architected and developed key components of the enterprise product. Led infrastructure as code with Terraform on GCP. Open source, YC W22.
eHealth: National Health Service of Ukraine
Co-designed and built the national platform behind reimbursements, EMR, e-prescriptions, and nationwide APIs for clinics and pharmacies. Led architecture, security, hiring, and hands-on Elixir + DevOps. All development open-sourced under Apache license.
Hammer Corp
Advertising platform for thousands of US automotive dealerships: ingests inventory, syndicates ads to major channels (at peak accountable for 30%+ cars on Facebook Marketplace), measures conversion, and collects leads to a unified interface with 24/7 human first-responder reps answering within 60 seconds.
Bullpen - Virtual Sales Floor + CRM
When COVID hit, our sales team lost the buzz of the office. We built a platform that brought it back - a CRM with virtual space where reps could collaborate, learn from each other in real time, and keep the same drive. Then we turned it into a standalone product with AI sprinkled around it.
TalkInto - Omnichannel Messaging Platform and CRM
Messaging/voice backbone powering products like Hammer, Bullpen, and Text2Buy: SMS, voice, various chat integrations, and web chat with clean agent UI and APIs. Features included local numbers, call recording, and routing.
Contractbook
Built the self-service billing system, B2B API and marketing pipeline that let kicked off the business growt.
Financial P2P Marketplace
Architecture and implementation for an institutional P2P lending marketplace for one of Europe's largest lenders ($9B portfolio).
Mbill - P2P Transfers
P2P transfer service for individuals and small-to-medium online merchants. Create a page for your card and share a link to receive payments. Includes customer cabinet, payment button constructor, and transaction reports.
Mastercard MoneySend
Front-end application to receive P2P transfers sent via recipient phone number. Country-wide rollout of phone-number-based transfers.
Forza - PayDay Loan Websites
Front-end, SMS gateway, decision engine, and marketing tools for an online lending originator operating in Moldova, Bosnia, and North Macedonia.
Best Wallet (ex. MBank)
eWallet cloud for worldwide money transfers. For B2C: pay for 2,700+ services across CIS, send money to phone numbers, cash out via partnered banks or cards. For B2B: free SaaS white-label eWallets for banks with simple integration.
IPSP.com - Payment Pages
Responsive landing and payment pages for an Internet Payment Service Provider. Improved conversion on payment flows via lighter UI.
ECommPay - Mobile App
iOS and Android business application for partners to manage payment platform on the go.
Autopayment
Automatically pays for bills based on two types of rules: by threshold of supplier balance (e.g., mobile top-up) or on a periodic basis.
Mobile Cashier
Turns Android devices into payment terminals for deposits and top-ups across numerous service providers, from cellular carriers to credit card loan repayments.
Sage - Sagas Pattern in Elixir
Dependency-free implementation of the Sagas pattern for distributed transactions with explicit compensation. Guarantees that either all transactions complete successfully, or compensating transactions amend partial execution.
LoggerJSON
Structured JSON logging for Elixir with first-class formatters for Google Cloud Logging, Datadog, and Elastic (ECS). Drop-in :logger formatter/handler with runtime config helpers.
Confex
Runtime configuration from environment variables with type casting and adapters (:system, :system_file). 12-factor friendly configuration for Elixir applications.
Elixir Bench
Continuous benchmarking platform for the Elixir ecosystem. Automatically runs performance benchmarks on each commit to detect regressions and track language performance improvements over time. Won Spawnfest 2017 and later accepted into Google Summer of Code.
Annon API Gateway
Configurable API gateway acting as a reverse proxy with a plugin system (ACL, Auth, Validation, CORS, Idempotency), request/response storage, metrics, management UI, and auth provider. Reduces boilerplate across services.
Ecto Mnesia Adapter
Ecto adapter for OTP's built-in Mnesia database that works in the same memory space as the application, providing extremely low latency without deploying a separate database.
Gandalf - Decision Engine
Open-source decision engine SaaS for rule tables, champion/challenger split testing, revision history, decision analytics, and debugging tools.
Man - Template Rendering Engine
Stores iex, mustache, or markdown templates and renders them with localization to HTML or PDF via REST JSON API. Includes an easy-to-use management UI. Free one-click deployment to Heroku.
Vagrant Box OS X
macOS Vagrant boxes for VirtualBox. Run UX tests or build iOS/Mac applications on any machine with a few CLI commands. Used by many teams worldwide, including Boxen.
Parasport - Foundation Portal
Medium-sized web portal for a foundation supporting Paralympic sport, physical rehabilitation, and social adaptation. Built on October CMS.
OneDayOfMine
Storytelling social network that helps see other people's lives through their eyes. Capture moments through the day and share them with descriptions - from special forces in Belarus to a family visit to a film museum in South Korea.
L15 - Night Club x Coworking
Experimental mix of coworking space and a night club ('clubworking') in Kyiv. Turned the office into a best-in-class night club and ran terrace events with world-class DJs every weekend for an entire summer.
Happy Customer
Outsource project to motivate small and medium-sized businesses to provide better customer service via public feedback and simple tracking.
truBrain 1.0
An early-stage product that needed help. I took some swings at UX and performance for free because I wanted to see them make it.
Blog
The Real 10x Engineer
The real multiplier in software isn’t writing more code. It’s judgment: choosing the right problems, avoiding unnecessary systems, and reducing the maintenance burden that slows teams down.
Introducing Sage - a Sagas pattern implementation in Elixir
Distributed transactions are hard and expensive, if you wonder how to pragmatically handle them in a mid-size project - this article is for you.
Run stale tests on file change in Elixir
Mix is an awesome tool but most Elixir beginners are not aware of all its features. mix test --stale is one of them and can make your workflow much better.
Runtime configuration, migrations and deployment for Elixir applications
Shortly after moving from PHP to Elixir I've faced a common issue, the way how do we deploy applications is totally different from the one I'm used to.
National Health Service, on Elixir and Kubernetes
A look at building Ukraine’s national-scale eHealth platform with Elixir, Kubernetes, and pragmatic architecture for reliability and scale.
Bringing blockchain properties to centralized government databases
Making it cryptographically impossible to alter records in a database even with full system access.
Alternative approach for sensitive file uploads
Using signed URLs for secure file uploads directly to cloud storage, bypassing your application servers entirely.
Designing a P2P Lending platform with Elixir in mind
With this post, I want to share with you the design process on one of our latest projects - a P2P marketplace that was intended to be used by hundreds of thousands of users.
AdTech Feb 2025 - Present

Blitz - Overlays, Personalized Stats, and Meta Insights Powered by Billions of Matches

Elixir Terraform GCP Kubernetes Redis PostgreSQL Cassandra TypeSense Riak HAProxy

The Ask

When I joined, the business was in a tough spot. Leadership started the rewrite everything in Rust to fix performance issues, operational costs were through the roof, and development had basically stalled. The ask was simple: keep things running while the rewrite happens. The scale was pretty substantional - we served 25k RPS with peaks around 150k RPS (and at times LLM bots/other scrapers tried to do as much as 1M RPS), tens of terrabytes of data per day, for a 7-figure DAU.

But here’s the thing - after digging in, I realized the language and framework weren’t the problem. It was infrastructure and lack of proper care. So instead of babysitting a dying system, I went all-in on fixing it.

Oh, and there was one more thing: the last backend engineer left a week after I joined. He stayed long enough to onboard me, but most of the internal knowledge walked out the door. So I was flying solo, reverse-engineering a system that nobody fully understood anymore.

What Was Actually Broken

The Backend Was Drowning in Errors

The Elixir apps were stable enough, but they were generating tens of thousands of Sentry events every day, requiring pretty sizeable Sentry cluster and sometimes taking it down requiring manual reset of Kafka queues. Most of it was noise-race conditions, failed inserts where upserts should’ve been used, unmaintained databases, and an overcomplicated architecture nobody fully understood.

I went heads-down for a month:

  • Added NIFs for JSON encoding (at our RPS, encoding/decoding JSON is expensive);
  • Mapped out internal dependencies and ripped out dead code;
  • Fixed race conditions one by one;
  • Did a maintenance pass on the billing system, which unlocked annual subscriptions and drove substantial growth in paying customers;
  • Removed places that were doing unnecessary remote RPC calls or inserting data to Riak just so that later Oban job can read it’s payload;
  • Later removed beefy Riak cluster - it was barely used and locked us on an old Erlang version;
  • And did the most basic thing - started every single day by just triaging Sentry, fixing issues, until there was nothing left.

I can’t thank Elixir and Erlang enough for their outstanding debugging story. Being able to remote shell into an IEx console, run :recon_trace, inspect production state, or even hot-patch a fixed module directly into production saved my ass countless times. You can’t do this stuff in most languages - it’s a superpower when you’re firefighting solo. This is one of many reasons why Elixir is my language of choise!

Result: Error rate dropped from six figures to ~42 events per day (mostly timeouts, which are occasionally fine at our scale).

Infrastructure Was a Mess

Manually provisioned, overprovisioned in some places, underprovisioned in others. Tons of stuff running that nobody used, but nobody could figure out if it was safe to remove because the codebase was so tangled.

Since I already knew how the backend worked, I started by stopping everything unused. Then I spent three months moving everything to Terraform - the only viable way to manage this solo. Along the way, I right-sized everything: merged Redis instances, moved caching to ETS where it made sense, cut the cluster size and cloud bill substantially (saved seven figures annually).

One almost comical example: A 6-node TypeSense cluster that was supposedly “overprovisioned for performance” and then another hand-written Rust search service was written to replace “slow TypeSense” - except 4 out of 6 nodes had been down for at least 5 months and were many major versions behind. I fixed it, downscaled to 3 nodes, and later recreated the whole thing from scratch on Terraform-managed infra with proper alerting, the data was fully migrated along the way. We removed hand-rolled service and had zero issues since then.

Monitoring Was Noisy and Expensive

The monitoring stack was decent in theory - everything was monitored. But alerts were misconfigured and fired multiple times a day even when nothing was wrong. The stack ran on Mimir hosted on its own 42-core, 160GB RAM, 12-node Kubernetes cluster.

I migrated everything to self-hosted VictoriaMetrics and Grafana running on two small VMs, then reworked every dashboard, panel, and alert to eliminate noise.

Savings from monitoring alone: $50k/year.

I also found we were sending tons of metrics directly to Google Cloud Monitoring that were never queried, costing thousands per month just for storage. Disabled all unused metrics. Same story with logs - at our scale, verbose logs aren’t that useful, but they were costing a lot to store. Never enable load balancer or HTTP requests logging on highload projects ;).

Databases Needed Love

Backend databases hadn’t gotten proper care in a while. Oban queues didn’t have reindexing and purging enabled, so queueing databases accumulated dead tuples and bloated indexes. Application databases had heavy queries running without proper indexes, and the default response to any performance issue was to upscale the database instead of tuning queries or looking at application behavior.

I’ve spent a week going though each PostgreSQL instance, analyzing pg_stat_statements, application that used it and tuning queries and the database itself. I removed unused indexes, added missing ones, rebuilt bloated indexes and perfomed manual vacuuming during off-peak hours. Changing PostgreSQL flags in most cases reduced it’s load by 20% right away and allowed further downscaling. Proper AUTOVACCUM settings ensured that I don’t need to go back there and do manual actions anymore.

Networking Costs Were Out of Control

GCP CDN and Media CDN are extremely expensive, mostly due to traffic replication costs across regions. Cloud Security Policies (CSP) on GCP bill per request, which didn’t work for our business model either - we had far too many requests, and it was costing a fortune.

I initiated a migration to BunnyCDN, which saved $500k/year. BunnyCDN required a bit more hands-on work and interactions with support, but they responded within 10 minutes every time and were great to work with.

DataBricks Serving was Slow and Expensive

We had a caching layer in front of DataBricks, but even with CDN in front of that, a small percentage of requests still hit DataBricks directly — pushing P99 latency for data requests into seconds.

I rebuilt the caching layer using DuckDB embedded in Elixir, and our data engineer (shoutout to Rafael) reworked the pyspark codebase to export parquet files to a GCS bucket. The new data layer watches for changes in those files, downloads them, and atomically swaps tables - never dropping a single request during updates. We stopped paying for DataBricks native serving (a few thousand per month) and P99 latency for data requests dropped to <100ms.

We’ve also deployed a self-hosted Airflow instance and are experimenting with on-demand self-hosted Spark clusters to eliminate our DataBricks dependency entirely.

Frontend Was Slow

Built on a home-grown React-based framework that sometimes took seconds to server-render a single page. It was a constant struggle to add new features, and improving performance was even harder.

I didn’t touch FE much, but I kept bringing up issues and need of taking proper care of it. One of other big issues was the number of JavaScript chunks that browser had to load per page view - 300 to 700 in some cases. With our request volume, it is a money-burning machine. The frontend team decided to rewrite everything in Svelte. It ended up being a great success!

A few small things I did change myself:

  • Removed large headers in generated JS chunks (at up to 700 JS chunks per page and our request volume, those headers were costing real money);
  • Internal routing to CDN assets was done on JavaScript side - it means we served way more traffic using node.js than we needed to, leading to latency spikes and high CPU/RAM usage. I’ve added HAProxy in front of the FE containers route that traffic cheaply with an additional level of CDN-like caching;
  • Migrated from Cloud Run to Compute Engine to reduce costs and have better control / easier debugging;

CI/CD Was a Gamble

The deployment system was a hand-rolled Slack bot that could accidentally ship a 6-month-old version of code to prod due to a bug. Nobody had deployed a fix to the entire cluster in over a year. Fixes were deployed from PR branches by manually editing CircleCI config in the branch, then removing the change before merging to master.

I moved everything to GitHub Actions ($24k/year saved) and built a proper CI/CD pipeline with Terraform. All apps now build containers and open PRs in the infra repo that auto-deploy when merged, with Slack notifications and full audit trails.


The Numbers

When I’ve joined

  • Cloud spend: ~$220k/month
  • Daily errors: Six-figure Sentry events
  • Monitoring cost: $50k/year for a 12-node cluster
  • CI/CD cost: CircleCI + custom tooling
  • CDN cost: GCP CDN + CSP billing disaster
  • Deployment: Manual, risky, slow

After 1 year at the company

  • Cloud spend: ~$20k/month (91% reduction)
  • Daily errors: ~42 events (mostly acceptable timeouts)
  • Monitoring cost: Two small VMs
  • CI/CD: GitHub Actions + Terraform automation
  • CDN cost: BunnyCDN migration saved $500k/year
  • Deployment: One-click, safe, automated

Total annual savings: Over $2M.


What I Learned

This wasn’t a language problem, a framework problem, or even really a scale problem. It was a maintenance debt problem. Systems don’t stay healthy on their own - they need care, attention, and someone willing to roll up their sleeves and fix the boring stuff.

The biggest wins came from:

  1. Measuring first - You can’t optimize what you don’t measure;
  2. Removing before adding - Dead code and unused infra cost money and mental overhead;
  3. Fixing root causes - Upscaling databases is easy, but designing with database in mind or at least doing EXPLAIN’s and fixing bad queries is what actually works;
  4. Boring infrastructure - Terraform, Compute VMs (no k8s or anything like that for a small team), proper CI/CD, and reliable monitoring aren’t sexy, but they’re worth their weight in gold;
  5. Focusing on what matters - Error budgets, SLOs, observable systems and simply caring during the day let you sleep at night.

The work proved that systematic infrastructure improvements can deliver both cost savings and reliability gains without sacrificing performance.

A Side Note on Rewrites

I’ll be honest - I’ve been guilty of chasing “the rewrite” in the past. It rarely works out. The new “fixed” platform almost never sees the light of day because the business can’t wait. Features need to ship, so you end up playing cat-and-mouse, implementing the same things in both the old and new systems. It’s exhausting, expensive, and demoralizing.

I’ve spent a lot of time thinking about this, and I keep coming back to this analogy:

What would you say to a plumber who tells you that in order to fix a few leaky pipes, he needs to tear down and rebuild your entire house?

Yeah. Exactly.

Most of the time, the best move isn’t a rewrite - it’s rolling up your sleeves and taking care of what you already have. Fix the leaks, replace the bad pipes, and keep the house standing. The business will thank you for it.


Thanks

Huge thanks to Austin Reifsteck for staying an extra week to onboard me and transferring as much knowledge as he could in that short window. It made all the difference.

And to Naveed Khan and the rest of the team - thank you for trusting me, letting me go wild, and giving me the space to change things aggressively without micromanagement. Not every team would tolerate my “move fast and fix things”/YOLO approach, and I’m grateful you did.

Projects
Blitz
Single-handedly owned ~20 high-traffic backends end-to-end (77 Elixir umbrella apps, 70+ Redis/50+ PostgreSQL instances) serving ~25k RPS with peaks to 120k for a 7-figure DAU product.
Firezone
WireGuard-based replacement for legacy VPNs. Re-architected and developed key components of the enterprise product. Led infrastructure as code with Terraform on GCP. Open source, YC W22.
eHealth: National Health Service of Ukraine
Co-designed and built the national platform behind reimbursements, EMR, e-prescriptions, and nationwide APIs for clinics and pharmacies. Led architecture, security, hiring, and hands-on Elixir + DevOps. All development open-sourced under Apache license.
Hammer Corp
Advertising platform for thousands of US automotive dealerships: ingests inventory, syndicates ads to major channels (at peak accountable for 30%+ cars on Facebook Marketplace), measures conversion, and collects leads to a unified interface with 24/7 human first-responder reps answering within 60 seconds.
Bullpen - Virtual Sales Floor + CRM
When COVID hit, our sales team lost the buzz of the office. We built a platform that brought it back - a CRM with virtual space where reps could collaborate, learn from each other in real time, and keep the same drive. Then we turned it into a standalone product with AI sprinkled around it.
TalkInto - Omnichannel Messaging Platform and CRM
Messaging/voice backbone powering products like Hammer, Bullpen, and Text2Buy: SMS, voice, various chat integrations, and web chat with clean agent UI and APIs. Features included local numbers, call recording, and routing.
Contractbook
Built the self-service billing system, B2B API and marketing pipeline that let kicked off the business growt.
Financial P2P Marketplace
Architecture and implementation for an institutional P2P lending marketplace for one of Europe's largest lenders ($9B portfolio).
Mbill - P2P Transfers
P2P transfer service for individuals and small-to-medium online merchants. Create a page for your card and share a link to receive payments. Includes customer cabinet, payment button constructor, and transaction reports.
Mastercard MoneySend
Front-end application to receive P2P transfers sent via recipient phone number. Country-wide rollout of phone-number-based transfers.
Forza - PayDay Loan Websites
Front-end, SMS gateway, decision engine, and marketing tools for an online lending originator operating in Moldova, Bosnia, and North Macedonia.
Best Wallet (ex. MBank)
eWallet cloud for worldwide money transfers. For B2C: pay for 2,700+ services across CIS, send money to phone numbers, cash out via partnered banks or cards. For B2B: free SaaS white-label eWallets for banks with simple integration.
IPSP.com - Payment Pages
Responsive landing and payment pages for an Internet Payment Service Provider. Improved conversion on payment flows via lighter UI.
ECommPay - Mobile App
iOS and Android business application for partners to manage payment platform on the go.
Autopayment
Automatically pays for bills based on two types of rules: by threshold of supplier balance (e.g., mobile top-up) or on a periodic basis.
Mobile Cashier
Turns Android devices into payment terminals for deposits and top-ups across numerous service providers, from cellular carriers to credit card loan repayments.
Sage - Sagas Pattern in Elixir
Dependency-free implementation of the Sagas pattern for distributed transactions with explicit compensation. Guarantees that either all transactions complete successfully, or compensating transactions amend partial execution.
LoggerJSON
Structured JSON logging for Elixir with first-class formatters for Google Cloud Logging, Datadog, and Elastic (ECS). Drop-in :logger formatter/handler with runtime config helpers.
Confex
Runtime configuration from environment variables with type casting and adapters (:system, :system_file). 12-factor friendly configuration for Elixir applications.
Elixir Bench
Continuous benchmarking platform for the Elixir ecosystem. Automatically runs performance benchmarks on each commit to detect regressions and track language performance improvements over time. Won Spawnfest 2017 and later accepted into Google Summer of Code.
Annon API Gateway
Configurable API gateway acting as a reverse proxy with a plugin system (ACL, Auth, Validation, CORS, Idempotency), request/response storage, metrics, management UI, and auth provider. Reduces boilerplate across services.
Ecto Mnesia Adapter
Ecto adapter for OTP's built-in Mnesia database that works in the same memory space as the application, providing extremely low latency without deploying a separate database.
Gandalf - Decision Engine
Open-source decision engine SaaS for rule tables, champion/challenger split testing, revision history, decision analytics, and debugging tools.
Man - Template Rendering Engine
Stores iex, mustache, or markdown templates and renders them with localization to HTML or PDF via REST JSON API. Includes an easy-to-use management UI. Free one-click deployment to Heroku.
Vagrant Box OS X
macOS Vagrant boxes for VirtualBox. Run UX tests or build iOS/Mac applications on any machine with a few CLI commands. Used by many teams worldwide, including Boxen.
Parasport - Foundation Portal
Medium-sized web portal for a foundation supporting Paralympic sport, physical rehabilitation, and social adaptation. Built on October CMS.
OneDayOfMine
Storytelling social network that helps see other people's lives through their eyes. Capture moments through the day and share them with descriptions - from special forces in Belarus to a family visit to a film museum in South Korea.
L15 - Night Club x Coworking
Experimental mix of coworking space and a night club ('clubworking') in Kyiv. Turned the office into a best-in-class night club and ran terrace events with world-class DJs every weekend for an entire summer.
Happy Customer
Outsource project to motivate small and medium-sized businesses to provide better customer service via public feedback and simple tracking.
truBrain 1.0
An early-stage product that needed help. I took some swings at UX and performance for free because I wanted to see them make it.
Blog
The Real 10x Engineer
The real multiplier in software isn’t writing more code. It’s judgment: choosing the right problems, avoiding unnecessary systems, and reducing the maintenance burden that slows teams down.
Introducing Sage - a Sagas pattern implementation in Elixir
Distributed transactions are hard and expensive, if you wonder how to pragmatically handle them in a mid-size project - this article is for you.
Run stale tests on file change in Elixir
Mix is an awesome tool but most Elixir beginners are not aware of all its features. mix test --stale is one of them and can make your workflow much better.
Runtime configuration, migrations and deployment for Elixir applications
Shortly after moving from PHP to Elixir I've faced a common issue, the way how do we deploy applications is totally different from the one I'm used to.
National Health Service, on Elixir and Kubernetes
A look at building Ukraine’s national-scale eHealth platform with Elixir, Kubernetes, and pragmatic architecture for reliability and scale.
Bringing blockchain properties to centralized government databases
Making it cryptographically impossible to alter records in a database even with full system access.
Alternative approach for sensitive file uploads
Using signed URLs for secure file uploads directly to cloud storage, bypassing your application servers entirely.
Designing a P2P Lending platform with Elixir in mind
With this post, I want to share with you the design process on one of our latest projects - a P2P marketplace that was intended to be used by hundreds of thousands of users.