
This is software (AWS) generated transcription and it is not perfect.
Becoming a Data Scientist was not really something that I wanted to do at the beginning. I always wanted to be a scientist. Not a Data Scientist. I was doing physics. So I have a PhD in theoretical physics, computational theoretical physics and then I graduated in 2011. I did my post-doc at Berkeley for three years. I also went back to Ohio State for teaching one semester. I was thinking I was on the track to be like a faculty member in some university doing physics. It was a dream at that time. But only a few positions were available for tenure track position for faculty. I think in 2014, the closest thing to physics that I can think of was Data Science because a lot of modeling that I did can be applied to Data Science problems because I'm a theoretical physicist so usually we build models even though this model is not called deep learning or something. But we come out with our own model and usually use experimental data to kind of fit the model and to exam perimeter. So the practices are somehow similar. The thing is we are using may be limited data because the experimental data that we're using is coming from like a certain experiment. The principal is about the same. The hardest thing when I made the transition was I guess learning Python. That may be the hardest thing in terms of math and Statistics because we have been trained to do that when we were doing physics. I guess doing Python in the right way was a little bit challenging at that moment. It took me maybe two months just to be familiar with Python. I was lucky to be able to make that the transition so early in the lifetime of data science because I guess in 2014-15, it was like kind of starting to emerge. The needs of data science and industry and at that moment in time, they were needing only people who can do modeling and not necessarily having a degree in computer science or machine learning or data science and things like that. Nowadays it seems to be harder because many Ivy-League universities have started to take advantage of this program by introducing, some online degrees and things like that. If you don't have some kind of degree in data sense or machine learning, it's really hard to kind of hard right now to jump into the field. So I was lucky enough.
I'll give you a little bit of background on what HP is doing. So I'm in the printing division and then they have two big divisions at the printing and personal system. There is the other emerging area that I should be like three-d and graphic and all those things, but those are kind of not picking up the business yet. In HP, because the business model is changing for the printing, on the printing side of HP is we are trying to be more contractual than transactional. So contractual is something like you create printing as a service. You sell printing as a service, not sell the printer itself. It means that most of the maintenance costs, the repair cost, and all those things will fall on us or on the channel partner who's taking care of that. So there was an idea to reduce the service cost per page, which means reducing the maintenance costs. If you know ahead of time, what is going to happen to a printer so you can predict what's going to happen to a printer and in which part of a printer is going to fail? So basically what we're building is building a failure prediction model, part-failure predictive model for a printer so we can know ahead of time which part in a printer is going to to be broken. Then we can tell the channel partner who is taking care of that printer, to be able to prepare the part so they can have the part. And they can schedule the engineer or the technician to go and visit. It can reduce the costs of maintaining those devices. So there today is basically we are looking at the repair data failure data and we build models. The thing is because HP is really new in this machine learning and Data Science Space, there is much of infrastructure work that needs to be done. So not only building the model, which is the fun part of doing Data Science but also kind of like consulting or advising in terms of the building data pipeline, how to create data analytics and things like that. And another important thing is because we create a live machine learning model, we require a lot of infrastructure around it to be able to make it happen. So all those things are what I have been doing so far, I guess.
I guess it's strongly dependant on the company. I've been with four different companies. For example, at Overstock, they like to use Scala and then they want to run it on data break frameworks. But with HP, because we are new, so for that program that we're using is mostly python and the framework on the printing side, we are still using a local machine right now. So I trained my model within my local machine. The good thing is, I have a very powerful machine. But then again, we are building that infrastructure slowly. It strongly depends on their funding. For the first two questions about software, programs, and frameworks, it is dependent on where you work and what existing framework that team has already been using. For models, algorithms, and languages, I was mentioning about Python. Models and algorithms, my principle is usually the most simple model that can answer the business question is the one that you go ahead and use because the complexity of the model usually affects the performance of the model when you put it into production. So because we care about efficiency. Every time we run a prediction, we pay Amazon some money, right? So it is very crucial to have a simple model which has a really good performance. There is a tradeoff that we need to make the simplicity of the model. But also, we don't want to sacrifice the performance of the prediction.