Walmart Technology Lead Machine Learning Engineer/Data Scientist at Walmart Technology
University of Illinois at Urbana-Champaign Doctor of Philosophy (PhD), Neuroscience
Current Time 0:00
/
Duration Time -:-
Progress: NaN%

How did you get to where you are today? What is your story? What incidents and experiences shaped your career path?

Summarized By: Jeff Musk on Tue Jan 28 2020
I started my career doing an undergrad in computer science at Texas Tech. After leaving Texas second, I took a job as a software engineer at a small company that did automotive software. I started my first day. It was a fairly terrible experience. I ended up lasting about three and a half weeks and then quitting that job, which turned out to be one of the best decisions I ever made. It was a little bit of a shovel for maybe four or five months after that, looking for what I wanted to do. Normally you arrange your job over your graduation. So you know what you're gonna do when you leave school. Having that sort of swept off the table, it was a little tough, but I ended up finding a really amazing job at a company called National Instruments, that does test and measurements, as an embedded vision systems engineer. That was a great opportunity. The company really took a lot of time to invest in its employees, so I had a lot of chances to learn and to grow my skills, both communication skills, and technical skills. I think really looking back, that choice to leave that first job was probably the most pivotal choice that I've ever made in terms of both my personal life and my career, because the job of national insurance required me to move to another city of Austin, Texas, from Houston, where the first job was. Basically everything great that's happened to me since then has been shaped by things that went down there. When I moved to Austin, I started looking for nonprofit opportunities, places to volunteer and found a great animal shelter in town that was treating a disease called Parvo. I started volunteering, doing that in my free time. I ended up meeting who would become my wife doing that volunteer activity. That's obviously pretty important to me. And, at the same time started delving into Data Analytics on behalf of the shelter. I was from a Computer science background, so that was terribly difficult. That sort of set me off on a path thinking about data science in a little bit of a different light, in a more practical setting than what I was seeing in my actual day job, which was around embedded vision systems. At the same time, I had always had a strong interest in how brains work and how humans function intelligence. Artificial intelligence is an area that I focused on during my undergrad. I was looking for opportunities to expand my knowledge in that area, read some textbooks, worked on just expanding my sort of fundamental knowledge. And ultimately, at some point, my wife decided she wanted to go to vet school. It had been a lifelong dream for her, so it wasn't a terribly hard decision, and we decided to apply to overlapping schools. So I applied to computational neuroscience programs and she applied to vet schools. We only applied the places that had both of those options. Ultimately, we both ended up at the University of Illinois, and I started doing my PhD in neuroscience there, with a focus on computational cognitive neuroscience is just a fancy way of saying, trying to write algorithms and math that reflects human behavior. Studying people who have what's called hippocampal damage of those folks. If you've ever seen the movie Memento, that's basically what hippocampal damage looks like when you have it very severely, you can't remember things, and it's like you're always living in the now, but you never actually create new episodic memories. I took a bit of an artificial intelligence flair on all of that, and ultimately that led me to look for a more data science-oriented position so I could do AI after I left grad school. It was a little bit of a tough decision not to stay in academia. What ultimately sort of led to me making my choice. I knew I wanted to come back to Austin, so I limited my view of what I was applying to for that. There were a couple of really cool opportunities, even a lab that I was a trusted going to in Austin. But when it came to what I wanted my career to be going forward, I really wanted to grow my technical computational skills. I've been spending a lot of time focusing on behavior and other things like that, so I wanted to get back to writing code every day. I picked an opportunity that gave me the best chance to do those sorts of things. As I got in the Walmart, just the job I started after my PhD I found we work in my office for a sort of internal customers. We don't work on consumer data or anything like that. So I found all sorts of interesting problems and really found there is a huge opportunity to contribute with unique sets of skills that I was developing over the last eight years of my career and now I lead the team of Data Scientists here in the Austin office. I got a group of about eight people doing data science and really enjoying all the challenges and opportunities that come with working for such a big company. I'm also still sort of exploring some of the ideas related to my PhD on the side. I get a lot of freedom to work on projects that I think are interesting. Taking that opportunity to continue to explore the theoretical aspects of intelligence and memory and neuroscience that's my story.

What are the responsibilities and decisions that you handle at work? Discuss weekly hours you spend in the office, for work travel, and working from home.

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
I like to describe my primary job as applying vision to a data science project. The way you learn data science is often by being handed a nice, clean, technical description of a problem. That technical description does not appear out of anywhere and often is very hard to dig up. So you have to be an effective communicator with non-technical people to sort of act as a translator for what their intentions are, what their objectives are, and then figuring out what data sources might necessarily have access to those, assembling all that together. And then finally, you end up with some sort of problem justification just where they agree to come into play. It's all those soft skills are things that you pick up as you gain more experience, how to ask the right questions, the politics of some of those things. And I construct a vision for how a particular project should be executed in order to accomplish whatever thing is happening, and that includes all the way down to the details of how are you gonna be picking the modeling, how you're gonna deploy it to a cloud service of a lot of cloud tools in our office. What validation methods are you going to use? What you gonna do with all of that goes wrong, if it doesn't work the way you expect to. And so all of those aspects I lead on computer projects that come through the office, which also involves a lot of mentorships. Obviously I want to grow the people on my team so that they're able to do one day the things that I'm doing right now. And so that involves explaining why I'm making choices the way I am and really working through the rationale of not just solving a data science problem but solving a problem where that problem is usually very ambiguous, able to find at the beginning. In terms of office hours and working sort of what my day looks like. The culture here is one that really supports you working the way that works best for you. I've never been a morning person. I hate waking up early in the morning. Sometimes you still have to go to a meeting here or there. But on most days, I could get away with coming in at 10:30 or 10 and starting my day a little bit late, which lets me dodge traffic. And so that's a bit of an optimization I get to do in my life. I don't have to deal with rush hour traffic in the morning Then I'll sometimes do the same on the way home, so I'll leave at 4 and work my last hour or two of work at home, Getting to dodge in the five o'clock rush hour traffic. It's a lot of flexibility in terms of office hours. If I need to take a day off, I can do that. There's not a strict sort of mechanism through which I need to manage my time off, but it's all about getting the job done, and the more senior you get, I think the more that's the pattern, So if you learn to be efficient in the management of your time, that means you get free time to do things like I still volunteer at Austin pets alive and work on data science and research for them. So, that would be how I would typically describe my schedule. It sort of starts late into early work from home, a little bit as needed for those sorts of things. In terms of travel, there are lots of travel opportunities. I tend to stay to the high priority travel, So I just got done doing a lot of travel. I want to double in to help to start an office there, doing data science. I have to go to the NeurIPS conference back in December, recently, which was a wonderful conference experience and something that I'm encouraged to do if I have the time to do it and then sometimes a little smaller trips to meet with cloud providers or other things like that to try and speck out a project, so the travel is variable. I try to limit it to no more than 10 to 15% of my time spent traveling. I do work from home often once a week, sometimes once every other week, depending on what's needed for my job. But I think those are important days to take because when you're in the office, there's a lot of distractions. There's a lot of things misdirecting you from your programming or other goals. And so taking a work from the home day once a week is something that's pretty encouraged in our office.

What tools (software programs, frameworks, models, algorithms, languages) do you use at work? Do you prefer certain tools more than the others? Why?

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
We for data science, pretty much use python for everything. It depends on the exact task, the cycle learns totally critical for any basic modeling efforts. Algorithm wise, Random forest seems to come back up repeatedly as being our simple models go one of the better classifier methods. We do use the AutoML framework. So I think that's becoming a really nice way to sort of check if you build a hand-designed model. How well are you doing really? It's a good benchmark to base sort of what you're doing. And if you have access to BigCloud computing resources, AutoML is very quick and easy to train. That's sort of the starting point in terms of tools stack. With the increasing complexity, we do a lot of natural language processing in our office. So we use things like spaCy is a great library for natural language processing, different sorts of embedding methods. We really tend to stay in here the cutting edge on stuff like embeddings, and so we use also to work out of Google and Facebook and another a place like that as well as trying to build custom models and things like TensorFlow, etcetera as needed. That's our typical sort of tool stack on the data side. We do like deploying things on Azure so Azure is where we tend to, a lot of our cloud computes, and my personal preference towards a lot of these things is basically the stuff I just described. I do think that there are times when you need some additional optimization and we'll use things like Scala or even C, or Cython or things like that as needed for optimization. The reasoning behind Python is probably pretty self-explanatory if you're doing data science already, which is that that's where most of the tools are. It's what became the de facto standard during the past 10 years or so and so if you want to find out the tutorial, you find a library to do something, it's probably gonna be a python. Then there are some reasons you might want to go to something like C sharp, as I said, for optimization purposes or for scalable enterprise style code, and we'll do integrations as necessary for those other libraries. So when you're working with the software engineer, you may deliver them some python code that they find a way to integrate it via command liner or via a service, API or something like that as needed for the different projects. We have used H2O pretty extensively. It is not easiest to work with of all of the AutoML frameworks, but it is definitely very scalable. When you're working on giant data sets that you can imagine with Walmart Scales is pretty critical. And H2O has been one that we've consistently found success with. AutoML is an interesting choice for depending on the size of the data set you're working with and the level at which you're willing to be hands-off with things like validation of performance. There are some tools in GCP and Azure that do AutoML. We tend to stay away from once that is highly coupled to the cloud provider, just in case we need to cloud providers at some point for some reason on then for local AutoML on just your laptop or something. AutoScaler is great, Teapot is great. Both of those libraries are ones that I found pretty good success with.

What things do you like about your job? Were there any pleasant surprises?

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
I was pretty nervous starting thinking that it is a giant company. I wasn't super clear on what the expectations were gonna be. We were at a new office at the time. A lot of how I became sort of lead data scientists was that I realized by being new, it gave me an opportunity to just take control of what I wanted my job to look like. I was very open and honest. We communicated the things that I wanted and the things that I enjoyed doing and sort of built a personal brand around those things that I actually wanted out of my job, which includes telling people I don't wanna have morning meetings and communicating those sort of softer aspects of the job. The pleasant surprises that everyone's unbelievably supportive of all of that. There's way more freedom than I expected, and this is something that I think is very counter to folks coming from an academic background. But my experience in both of my more extended, since in industry is that actually have a lot more freedom to try out the ideas. I want an industry that I did in academia the freedom seemed to be there because I had all the time in the world to spend on stuff. But I didn't have the resources or I had other pressures driving my decision making. And a lot of those pressures go away when you're in an industry because they should if you're performing to the level that you need to. I get no complaints about my performance. I'm giving a lot of freedom ticks for different things, like reinforcement learning that I find to be a very fascinating fun set of techniques.

What are the job titles of people you routinely work with inside and outside of your organization? What approaches do you find to be effective in working with them?

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
Data engineers are probably the most important folks to us, in the data science space. They really help us get the data, get access to things in a scalable way as well, right out what we're doing and communicating with them is really all about being clear on exactly what you need. Exactly what deliverable you expect to get. Software engineers, similar sort of thing. We often have to communicate what we have so that they can understand the best way to deliver it, either in the form of a front end. So front end engineers will design the front end aspects of those. We don't make huge distinctions between front and back end engineers. People often like to call themselves full-stack engineers. A lot of that is just semantics. The reality is you're gonna be given some self-programming related tasks. Having UX, user experience designers, I think it's great. I think every company needs more UX people, because user experience often is not the starting point for some of these projects, and I think it should be because ultimately it's the user who has to do something with a lot of these things. And then we have what is called product owners or product project managers on their role is to sort of coordinate all of these folks together. They're often nontechnical. That means communicating with them involves really trying to stick away from jargon and try to stay with things like deliverables and timelines and what you need exactly being very specific and precise in your communications about those things. There are some other more rare titles, things like cloud engineering, stuff like that. But I think the ones that I just described are the key ones.

What major challenges do you face in your job and how do you handle them? Can you discuss a few accomplishments?

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
I think the two biggest challenges are ambiguity. So having people come to you and they want something, but they don't know how to describe it to you. They don't know the state of the art in the field, or what's possible. There's a huge amount of ambiguity where you have to be comfortable making choices and testing out your resolutions to that ambiguity yourself. Another really big issue, which is related to that is the ability to access the data that will actually answer the question at hand. Oftentimes that data access is the slowest part of the project, and as a result, it's very easy to take that element of ambiguity, which is how do I get the data and how do I use it to my advantage and use it is a reason to idle and not move forward on particular aspects of the project. The reality on any project is that there's gonna be some public data set or you could generate artificial data, or there's gonna be some preparatory work that you could do such that when that data comes in, you can be prepared with your own little toolset that you've built for yourself and that you're ready to go on. You aren't starting from scratch the moment that data gets in. I see it is a common mistake and younger, less experienced data scientists that they think the data is the starting point. The data is just the transition from it being a problem that you're trying to define and understand to being an implementation that you're trying to develop. We have a big application here, the office that we've done that's around natural language processing. When it was first presented to me, there was no data. And so I spent a while figuring out what the best techniques might be and constructing a sort of schematic design for a feature vector that I thought was gonna be a particularly good encoding of some of the information we might be interested in on. When the data did start to come in and we started to get some labels from users and things like that, I was pleasantly surprised to see you that it only took a couple of 1000 of these labels and we were already getting F1 score over .9 and the results were looking very promising on most of the models. So all that preparatory work meant that the data actually got there. It was just a simple matter of hitting run, and most of the work was already done. And so I think that's a really critical challenge that everyone's gonna face on that you can overcome by simply thinking carefully about your problem by being prepared for when the data is available.

What are the recent developments in the field? How significant are these improvements over past work? What are their implications for future research & industry applications, if any?

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
I gave a talk on this recently to the office after that NeurIPS conference. And I think one of the most significant recent developments in a lot of this modeling is attention-based models or attention mechanisms within models. In neural networks, neural attention allows for what you can think of a sort of a dynamic mapping of particular features within different layers, maybe the input layer, maybe somewhere else in your network. It's often used in sequence to sequence models, but I think the realization is occurring that attention as a mechanism is much more general and much more useful in a wide set of areas. There are also a lot of really interesting questions when you have an attention mechanism about interpretability and what you could get out of that attention component of the model to help explain in some way some of the choices that were being made by the model. I think the combination of techniques around attention-based methods and some of the more recent developments in artificial neural networks and deep learning therein as well as how all those things apply to the explained ability of models are going to be some of the more significant things that we're gonna be looking at. A lot of industry applications, folks will start off saying they're very comfortable having a black-box model, and then the moment you deliver it to them, they'll use it. And then a couple of weeks later, they'll come back and start asking you why it's doing the things it's doing. I think that's a very natural feeling to have. When you're working in conjunction with an AI system that is making decisions to try to help you accelerate your job, you want to know how it's doing and why it's doing that. Future research doesn't necessarily need to focus on those industry elements, though. I think future research should really be focusing on how some of these mechanisms can go towards more advanced styles of computation like compositionality. When I was in neuroscience, we called It's for Relational associative memory because we know that the ability to compose elements on the fly in a dynamic way is really credible to creative thinking in humans. There's a lot of really cool studies and evidence in that space, and so none of our models do this very well. And I think when we're looking at ways to make more general systems or systems which transfer between tasks before easily the ability to do things like compositionality, perhaps be an attention mechanism or something else is gonna be absolutely incredible.

What was the hiring process like for your job? What were the roles of people who interviewed you? What kind of questions were asked?

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
I interviewed with I believe 3-4 separate folks. Three of them were technical interviews who were more than any of the other job interviews that I did. The questions ranged from really basic things. Like you built a binary classifier. How are you gonna evaluate your performance? Much more complicated questions. Like what is the actual role and contribution of skip connections or of other specific components with the neural networks? It's been a while since I've interviewed, so I don't remember many of the specific questions, but they ranged in level of abstraction and a level of challenge. Some of them were more software-based. Let's say someone presents you with this problem. How would you approach it? What software components do you think you're gonna need to build? What tools would you use? While others were much for theoretical and esoteric. There was one interview which was more about the behavioral or cultural interview, which was about how I prefer to work and for what my history is and what I like to do in terms of particular tasks at work. Of course, I mentioned all the non-profit work that I've been doing and those sorts of things I think are really great contributions to an interview. Like, you get to know the person. I think we all work best when we're happy with what we're doing. Making clear to other folks what makes you happy is a really nice way to make sure you end up in the job that you actually want. 

What qualities does your team look for while hiring? How does your team interview candidates?

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
There's sort of a baseline set of competencies that we look for. You have to be able to use the toolsets that are common in data science We don't necessarily put a huge emphasis on your deep understanding of a particular model. As I mentioned earlier that maybe Random Forest is my favorite thing to use as a starting point in some of these classification problems. I ask you a bunch of questions about running forests if it becomes clear to me very quickly that you don't know a lot about Random Forest beyond what they are. That's not necessarily gonna be a deal-breaker in the interview. What matters is that you know what you know, and then you also know where your gaps are and you're interested in learning those elements that are gonna be relevant to the job that you don't know. So someone who comes in and is maybe less knowledgeable about a lot of the details of the models but shows a real passion to learn those things is going to be seen much more favorably than someone who maybe those a good amounts but has no interest in learning more, so that's a huge, huge component of it. Another thing that we really look for, is this issue of ambiguity. The ability to deal with sort of ill pose problems and questions in a productive way. Some of the questions that we'll ask will be an intentional sort of vague statement where the goal is to hear how you handle you know of a problem because that's something that you're gonna really deal with within the real world. We do have these technical interviews that I'll often do sometimes with the other senior folks. So the team will do. And then we tend to have some sort of once again behavioral or cultural interview to make sure that you're gonna be happy in the position and that the way that we work in the office is gonna be something that you enjoy, which I think personally, most people would do really well in a situation where they give a lot of freedom. It can be a challenge if you respond better to a very rigid, structured objectives and environments and being told what to do sort of at every step. This environment may not be the right one for you because there's a whole lot of freedom. And so some of those things behaviorally around self-determination at the ability to make decisions about what is the best use of your time. Those are critical things we'll look for.

What are different entry-level jobs and subsequent job pathways that can lead students to a position such as yours?

Based on experience at: Lead Machine Learning Engineer/Data Scientist at Walmart Technology, Walmart Technology
Summarized By: Jeff Musk on Tue Jan 28 2020
Entry-level positions doing data science with another company. You worked for a year or two in another company that experiences and that list of projects that you've performed. The list of projects you've worked on is really the critical bit. So if there are two candidates, one that worked for a company for four years and did one project for that company, and that was it. And even if successful, I don't necessarily think that is as good of a reflection of someone as compared to one who maybe hasn't worked in data science before but has a huge list of data science projects that they've done on the side, maybe for a nonprofit, like what I like to do, or maybe just for fun things that they enjoy trying to build stuff around. It's really that demonstration of the skills and capabilities that matters more than the job history or the particular job path. I do think folks in data science that have computer science, knowledge, and background tend to have an easier time coping with some of the challenges because you can think of optimization scale software architectures. You can build components that may be a data science without as much software skills and a special computer science knowledge wouldn't be able to do. It's really more of thinking in terms of skills, projects, capabilities that have been demonstrated than in the particular job history in question. That would be for entry-level data science positions here. The more senior you get in terms of being like a senior or staff data scientists, the more those requirements become a little more challenging.

How did the program prepare you for your career? Think about faculty, resources, alumni, exposure & networking. What were the best parts?

Based on experience at: Doctor of Philosophy (PhD), Neuroscience, University of Illinois at Urbana-Champaign
Summarized By: Jeff Musk on Tue Jan 28 2020
Doing research is a very different sort of self-determined process than what you're going to deal with within the industry. The objectives are often extrinsic, that coming from some stakeholder or from your boss. But the abilities that get in doing research at the university and in having to ask questions no one has ever asked before and then go answer them. That skillset, is really at the core of everything that you do in any job, and you're confronted with it very directly in doing a PhD, you're confronted with ambiguity. You're confronted with the fact that you are being required to ask questions whose answers are not known. I think the best part about it is when you actually get to be the first person to know something like that. It's a rare feeling because you often are so deep in what you're exploring, and whatever your particular topic and research are that you don't even think about it in terms of being something new. It's just some elements of the puzzle you're looking at, but when you finally get to that point of publishing a paper, there's a sort of sense of I've now contributed to the world now added knowledge where there wasn't some. You get to start becoming part of a conversation where that conversation has been there all along. You now have all the context you need to really understand what's happening and even being here at Walmart, I still keep up with the field and getting to still have a part in that conversation about how do people do creative thinking? How do people solve problems that involve arbitrary relational binding, which is something none of our AI do well at this point, that conversation is really exciting, and it's really fun to get to imagine the world as it will be as we continue to answer these questions and that ability to resolve the ambiguity that comes from doing research is truly invaluable. I actually think it goes both ways. I finished my PhD much faster than the program average, and I think it's because I worked before I went to the PhD, so I learned some skills from an industry that I took to be more successful in academia and then I think these academic skills also make me more successful in the industry. Texas CS program. I think the most interesting components were electives. The elective program elective courses are obviously a thing in every undergrad situation. I took one of neural networks. This was long enough ago that I actually went to the math department at one point and asked about artificial neural networks, and they said, "Oh, we've already proven those aren't useful. You should look support vector machines." and that was a legitimate opinion at the time. I took a graduate-level elective in neural networks, so I learned a lot of the basic map behind it. The mentorship that I got through doing that, I think, gave me the confidence that I could understand these topics and that I could actually dig into them. I didn't participate in research quite as much as I really wanted to. It should have at that stage in my career, I was much more focused on application. I picked up projects working on things like I built a really simple computer vision program for the Vietnam War Archive that was there that just processed binary codes on images and produced what the decoded version of it was a really simple thing like that. One thing that I didn't anticipate as I was doing an undergrad that has come up a lot is how much some of the basic computational complexity theory really doesn't go away. It's always there, sitting behind everything you do, bigger notation and the ability to think in terms of the memory in space, space and time constraints of what you're working in. If anything, it becomes more important, the farther along you go in your career. Some of those things that seem somewhat esoteric as I was taking them in undergrad, I'm glad I paid enough attention that when they came back up, I was able to know how to to go re-learn some of things that I needed to learn.