Kevin Horecka @ Walmart Technology, Lead Machine Learning Engineer/Data Scientist at Walmart Technology

Jan 28, 2020

Kevin Horecka

Walmart TechnologyLead Machine Learning Engineer/Data Scientist at Walmart Technology

University of Illinois at Urbana-ChampaignDoctor of Philosophy (PhD), Neuroscience

This is software (AWS) generated transcription and it is not perfect.

How did you get to where you are today? What is your story? What incidents and experiences shaped your career path?

I started my career doing an undergrad in computer science at Texas Tech. After leaving Texas second, I took a job as a software engineer at a small company that did automotive software. I started my first day. It was a fairly terrible experience. I ended up lasting about three and a half weeks and then quitting that job, which turned out to be one of the best decisions I ever made. It was a little bit of a shovel for maybe four or five months after that, looking for what I wanted to do. Normally you arrange your job over your graduation. So you know what you're gonna do when you leave school. Having that sort of swept off the table, it was a little tough, but I ended up finding a really amazing job at a company called National Instruments, that does test and measurements, as an embedded vision systems engineer. That was a great opportunity. The company really took a lot of time to invest in its employees, so I had a lot of chances to learn and to grow my skills, both communication skills, and technical skills. I think really looking back, that choice to leave that first job was probably the most pivotal choice that I've ever made in terms of both my personal life and my career, because the job of national insurance required me to move to another city of Austin, Texas, from Houston, where the first job was. Basically everything great that's happened to me since then has been shaped by things that went down there. When I moved to Austin, I started looking for nonprofit opportunities, places to volunteer and found a great animal shelter in town that was treating a disease called Parvo. I started volunteering, doing that in my free time. I ended up meeting who would become my wife doing that volunteer activity. That's obviously pretty important to me. And, at the same time started delving into Data Analytics on behalf of the shelter. I was from a Computer science background, so that was terribly difficult. That sort of set me off on a path thinking about data science in a little bit of a different light, in a more practical setting than what I was seeing in my actual day job, which was around embedded vision systems. At the same time, I had always had a strong interest in how brains work and how humans function intelligence. Artificial intelligence is an area that I focused on during my undergrad. I was looking for opportunities to expand my knowledge in that area, read some textbooks, worked on just expanding my sort of fundamental knowledge. And ultimately, at some point, my wife decided she wanted to go to vet school. It had been a lifelong dream for her, so it wasn't a terribly hard decision, and we decided to apply to overlapping schools. So I applied to computational neuroscience programs and she applied to vet schools. We only applied the places that had both of those options. Ultimately, we both ended up at the University of Illinois, and I started doing my PhD in neuroscience there, with a focus on computational cognitive neuroscience is just a fancy way of saying, trying to write algorithms and math that reflects human behavior. Studying people who have what's called hippocampal damage of those folks. If you've ever seen the movie Memento, that's basically what hippocampal damage looks like when you have it very severely, you can't remember things, and it's like you're always living in the now, but you never actually create new episodic memories. I took a bit of an artificial intelligence flair on all of that, and ultimately that led me to look for a more data science-oriented position so I could do AI after I left grad school. It was a little bit of a tough decision not to stay in academia. What ultimately sort of led to me making my choice. I knew I wanted to come back to Austin, so I limited my view of what I was applying to for that. There were a couple of really cool opportunities, even a lab that I was a trusted going to in Austin. But when it came to what I wanted my career to be going forward, I really wanted to grow my technical computational skills. I've been spending a lot of time focusing on behavior and other things like that, so I wanted to get back to writing code every day. I picked an opportunity that gave me the best chance to do those sorts of things. As I got in the Walmart, just the job I started after my PhD I found we work in my office for a sort of internal customers. We don't work on consumer data or anything like that. So I found all sorts of interesting problems and really found there is a huge opportunity to contribute with unique sets of skills that I was developing over the last eight years of my career and now I lead the team of Data Scientists here in the Austin office. I got a group of about eight people doing data science and really enjoying all the challenges and opportunities that come with working for such a big company. I'm also still sort of exploring some of the ideas related to my PhD on the side. I get a lot of freedom to work on projects that I think are interesting. Taking that opportunity to continue to explore the theoretical aspects of intelligence and memory and neuroscience that's my story.

What are the responsibilities and decisions that you handle at work? Discuss weekly hours you spend in the office, for work travel, and working from home.

I like to describe my primary job as applying vision to a data science project. The way you learn data science is often by being handed a nice, clean, technical description of a problem. That technical description does not appear out of anywhere and often is very hard to dig up. So you have to be an effective communicator with non-technical people to sort of act as a translator for what their intentions are, what their objectives are, and then figuring out what data sources might necessarily have access to those, assembling all that together. And then finally, you end up with some sort of problem justification just where they agree to come into play. It's all those soft skills are things that you pick up as you gain more experience, how to ask the right questions, the politics of some of those things. And I construct a vision for how a particular project should be executed in order to accomplish whatever thing is happening, and that includes all the way down to the details of how are you gonna be picking the modeling, how you're gonna deploy it to a cloud service of a lot of cloud tools in our office. What validation methods are you going to use? What you gonna do with all of that goes wrong, if it doesn't work the way you expect to. And so all of those aspects I lead on computer projects that come through the office, which also involves a lot of mentorships. Obviously I want to grow the people on my team so that they're able to do one day the things that I'm doing right now. And so that involves explaining why I'm making choices the way I am and really working through the rationale of not just solving a data science problem but solving a problem where that problem is usually very ambiguous, able to find at the beginning. In terms of office hours and working sort of what my day looks like. The culture here is one that really supports you working the way that works best for you. I've never been a morning person. I hate waking up early in the morning. Sometimes you still have to go to a meeting here or there. But on most days, I could get away with coming in at 10:30 or 10 and starting my day a little bit late, which lets me dodge traffic. And so that's a bit of an optimization I get to do in my life. I don't have to deal with rush hour traffic in the morning Then I'll sometimes do the same on the way home, so I'll leave at 4 and work my last hour or two of work at home, Getting to dodge in the five o'clock rush hour traffic. It's a lot of flexibility in terms of office hours. If I need to take a day off, I can do that. There's not a strict sort of mechanism through which I need to manage my time off, but it's all about getting the job done, and the more senior you get, I think the more that's the pattern, So if you learn to be efficient in the management of your time, that means you get free time to do things like I still volunteer at Austin pets alive and work on data science and research for them. So, that would be how I would typically describe my schedule. It sort of starts late into early work from home, a little bit as needed for those sorts of things. In terms of travel, there are lots of travel opportunities. I tend to stay to the high priority travel, So I just got done doing a lot of travel. I want to double in to help to start an office there, doing data science. I have to go to the NeurIPS conference back in December, recently, which was a wonderful conference experience and something that I'm encouraged to do if I have the time to do it and then sometimes a little smaller trips to meet with cloud providers or other things like that to try and speck out a project, so the travel is variable. I try to limit it to no more than 10 to 15% of my time spent traveling. I do work from home often once a week, sometimes once every other week, depending on what's needed for my job. But I think those are important days to take because when you're in the office, there's a lot of distractions. There's a lot of things misdirecting you from your programming or other goals. And so taking a work from the home day once a week is something that's pretty encouraged in our office.

What tools (software programs, frameworks, models, algorithms, languages) do you use at work? Do you prefer certain tools more than the others? Why?

We for data science, pretty much use python for everything. It depends on the exact task, the cycle learns totally critical for any basic modeling efforts. Algorithm wise, Random forest seems to come back up repeatedly as being our simple models go one of the better classifier methods. We do use the AutoML framework. So I think that's becoming a really nice way to sort of check if you build a hand-designed model. How well are you doing really? It's a good benchmark to base sort of what you're doing. And if you have access to BigCloud computing resources, AutoML is very quick and easy to train. That's sort of the starting point in terms of tools stack. With the increasing complexity, we do a lot of natural language processing in our office. So we use things like spaCy is a great library for natural language processing, different sorts of embedding methods. We really tend to stay in here the cutting edge on stuff like embeddings, and so we use also to work out of Google and Facebook and another a place like that as well as trying to build custom models and things like TensorFlow, etcetera as needed. That's our typical sort of tool stack on the data side. We do like deploying things on Azure so Azure is where we tend to, a lot of our cloud computes, and my personal preference towards a lot of these things is basically the stuff I just described. I do think that there are times when you need some additional optimization and we'll use things like Scala or even C, or Cython or things like that as needed for optimization. The reasoning behind Python is probably pretty self-explanatory if you're doing data science already, which is that that's where most of the tools are. It's what became the de facto standard during the past 10 years or so and so if you want to find out the tutorial, you find a library to do something, it's probably gonna be a python. Then there are some reasons you might want to go to something like C sharp, as I said, for optimization purposes or for scalable enterprise style code, and we'll do integrations as necessary for those other libraries. So when you're working with the software engineer, you may deliver them some python code that they find a way to integrate it via command liner or via a service, API or something like that as needed for the different projects. We have used H2O pretty extensively. It is not easiest to work with of all of the AutoML frameworks, but it is definitely very scalable. When you're working on giant data sets that you can imagine with Walmart Scales is pretty critical. And H2O has been one that we've consistently found success with. AutoML is an interesting choice for depending on the size of the data set you're working with and the level at which you're willing to be hands-off with things like validation of performance. There are some tools in GCP and Azure that do AutoML. We tend to stay away from once that is highly coupled to the cloud provider, just in case we need to cloud providers at some point for some reason on then for local AutoML on just your laptop or something. AutoScaler is great, Teapot is great. Both of those libraries are ones that I found pretty good success with.

MentorStudents.org

MentorStudents.org

Kevin Horecka @ Walmart Technology, Lead Machine Learning Engineer/Data Scientist at Walmart Technology

Kevin Horecka

How did you get to where you are today? What is your story? What incidents and experiences shaped your career path?

What are the responsibilities and decisions that you handle at work? Discuss weekly hours you spend in the office, for work travel, and working from home.

What tools (software programs, frameworks, models, algorithms, languages) do you use at work? Do you prefer certain tools more than the others? Why?