What the Tech?!

Faranak Sharifi.png

Faranak Sharifi

Site Reliability Engineer at ecobee

Interviewer: Steven Zhang

Q. Before we get started with the interview, could you please quickly introduce yourself?

My name is Faranak. I've been working as a Site Reliability Engineer at ecobee for almost two years now. I started my internship at ecobee and I'm [working] full-time right now. [My] fun fact is, since the pandemic, I've been listening to a lot of history podcasts. 

 

Q. In your opinion, what are the most important qualities for a successful Site Reliability Engineer?

I think one of the most important qualities would be persistence. I think this also applies to developer roles. And I say this because, especially when you're new, a lot of the time, you're gonna be dealing with errors, troubleshooting, and debugging. You really need to get good at that, and so you need to, like always, be motivating yourself to figure it out all the time. I think how Site Reliability differs from software engineering is [that in] software engineering, you can have your own environment, you can code and then test in it. [But] with Site Reliability, you're in production. That’s the nature of Site Reliability. [It] is about things that are already in deployment, or in production, so you need to be a little careful. Make sure that you are 100% confident about what you want to do, and that you've tested everything, and it won't negatively affect production. Those are some of the qualities that I think are important.

 

Q. You mentioned that you wanted to tell us a bit about what a Site Reliability Engineer is–I know, it's a little off script, but if you want to give us a quick explanation on that, go ahead!

Yeah, so Site Reliability comes from DevOps. Back in the day, developers wrote the code, and then operations people deployed the code. So that means that as soon as the software was packaged and ready to be given to customers, it was deployed to production and handled by the operations team. That led to the whole field of DevOps, where the people that write the code also deploy the code. Right now you can see, across the industry, a lot of DevOps roles. And so Site Reliability comes from DevOps. It's a more specific type of DevOps. We usually say, “class DevOps implements SRE,” as in it's a more specific type of it and it deals with reliability a little more, like having SLOs, which are Service Level Objectives with your customers–having an agreement between you and your customers about how reliable the services are. SREs believe that the most important feature of a product is its reliability, [whereas] developers might tend to think like, “oh, it's new features.” But if you have a service that you're using, it is more important that it's reliable than, you know, having features that might not work all the time. That's kind of what SRE is and how the field is currently.

 

Q. What is something you enjoy about being a Site Reliability Engineer?

 

I think what I like is that it involves a lot of problem-solving. Every day, you're likely dealing with a new challenge. It can be tiring over time, but once you figure something out or you finish your project, it feels really good. You feel accomplishment because of that. In school, when we learnt about software, it was mostly about programming languages or math, but SRE [specifically] is about the whole picture–servers, the hardware side of things, networking, and all these other parts of the whole system. So just learning more about that has made me more passionate about the whole thing.

 

Q. What are some examples of day to day tasks one might do as a site reliability engineer?

Usually, you're, at least for myself, working with Cloud tools–so a lot of GCP, Kubernetes, TerraForm. I think one of the things that also comes up is just meetings, but the meetings are usually about new tools that the team can adopt. Also, a whole aspect of SRE is monitoring. [Like] monitoring the state of your application using tools like Prometheus and Grafana. Those are just examples of day-to-day tools that we use. 

 

Q. What is a common misconception of being a site reliability engineer?

I think, in school, I feel like most people don't know what [SRE] is, so it's not like you have a misconception of something you don't know about. But from what I've gathered at work is that it seems super cool but also intimidating because most people don't really know enough about it to be able to figure it out. So I think it seems like it's this scary or intimidating thing, [but] it's not really, it’s just skills that you can learn.

 

Q. What are some of the biggest challenges of being a site? reliability engineer?

I think one would just be that it can be stressful sometimes because you're always kind of dealing with production and the reliability of the services. And one part of being SRE is being on-call. That's usually like, if something breaks, then you need to be able to know how to respond and be able to resolve the problem and communicate that to your stakeholders. So that can be a little stressful, especially because it can be during your off hours. I think the one other thing would be diversity. I do find that it is hard to have diversity in tech, but SRE is actually one of the fields in computing that has the least amount of diversity in the whole industry. There's more diversity in development, or data science, but SRE has the least, and the more senior you are, the less diversity there is.