I hear this question almost every month from at-least one product engineer(specially new hires) in the team. In this blog post, I will to try to answer this question with my experience of doing production support at Gojek, an Indonesian superapp with 18+ products.
Currently, I am leading a team called IronBank which is responsible for profitability of Transport product. Transport product is responsible for moving people from Point A to Point B in 4 countries – Indonesia, Singapore, Thailand and Vietnam.
We don’t have dedicated production support folks (yet). From Transport group, we have 3 product engineers on production support. These people are rotated weekly. So this blog post is within the context of a company with decent scale and no dedicated SRE folks.
My initial days at Transport
On my first day, I saw Gojek being down for ~40 minutes because google maps directions api started returning html code in the response 🙂 . It was an intense day. Almost everyone in the company was on production support call. I saw few folks handling the communication with customer support folks, keeping them upto date with the issue. Few folks were trying to reach out to Google support to get the eta on the fix. Two engineers from Transport were building the fallback mechanism of google maps directions api. At that time, Transport was quite lean. The product team had around ~7 product engineers(including frontend) with two PMs. After 2 days of “orientation” my mentor asked me to do production support 🙂
My production support experience
Initially, I struggled a lot. That was mainly because of three reasons – 1) Gojek didn’t have any process around production support 2) Gojek architecture and scale were huge 3) I didn’t have any real experience of doing production support. Long story short, I was on production support for almost 2 months and I would say those days helped me grow as an engineer. I felt a steep learning curve.
Below are the list of things I learned during production support
- I developed customer empathy
- It was very emotional for me to see when thousands of drivers were not getting bids/bookings because one of our database vm restarted 😦
- One day at around 5pm Jakarta time, thousands of customers were stuck in their office because we forgot to change the data type of one column from integer to big int(integer overflow)
- I learned about the Alerting & monitoring infrastructure of Gojek
- I learned how to do write good RCAs
- Production support helped me figure out the architecture issues in our systems. This also helped me in figuring out the tech debt items
- I met a lot of (good) product engineers in the company during production support and learned a lot from them
- I made tonnes of memories while doing production support. It was a fun time
Published by Deven Bhooshan
Food-lover, Programmer, Web Developer.
View all posts by Deven Bhooshan