
This is software (AWS) generated transcription and it is not perfect.
I started school as a linguistics major and eventually moved into computational linguistics on at the time. Really, data science wasn't even a term yet. The official term was coined really kind of again back in 2010. And so I started working with my tell and Sun Microsystems in their early days, and we were trying to take things like the phone system, the PBX system, and integrate that in with a thin clients or that you could take your phone and your call and your conferences as well as your session from your laptop or from your thin client anywhere you went. The technology was a little bit early, and so we never ended up watching that technology. And so I went out and started doing contracting in consulting and eventually started working with a staffing firm and recruiting firms so that I could go and help other people grow their careers. So as time went on. I really focused on that but was also able to kind of grow the consulting work and really was able to help a lot of different people. But in that process, I was also able to learn from others on how they were learning, what was the latest and greatest technologies, what were they doing from a technical process because I was doing technical interviews, you know, dozens of times a day. And so they give me a chance to kind of learn what was happening within the industry and what companies were doing and what was coming up. And that's when I discovered Hadoop as a technology on Created the first Hadoop Users group here in Utah and one of the first ones in the country. And we grew that to be one of the largest and was eventually I was invited by the White House to come back and work with the Office of Science and Technology Policy and the NSF, the National Science Foundation, to help create the National Data Science Organizer's and the big data hubs around the country as well. And so now, data science and big data started becoming a thing, and the whole world started changing pretty drastically, and eventually I decided to go out on my own and started my own companies to focus on big data and data science across multiple sectors, including government as well as academia and nonprofit and for-profit in the straits. And so really, along the way I found that that it was in learning itself and then hearing from others in the network on what the technologies were, how to focus on them and how to quickly pivot to, learn what the latest and greatest technologies were and what the state of the art was and how to deploy that to a problem in the long run. I'd love to say that my technical skills are absolutely incredible, but really my core skills around finding the right use case, the right business need and presenting that actual value to accompany to business and making sure that the technology is sound so that it actually meets that needs. So that's kind of my story, and really it's based on others.
My responsibilities include everything from building and growing the team to deploying new technologies to create new innovation. So Finicity is a fin-tech company, and we really cover everything from the finance industry to the fin-tech industries, all of the big financial institutions, like the banks and the credit unions as well a lot of, lot of government or quasi-government type institutions. So things like the GSC is that a fanny meaning Freddie Mac, we work with groups like Experian to create things like Experian, Boost or Ultra FICO with the FICO team to, help them be able to take the credit scoring models that exist today and start being more inclusive of individuals around the country and hopefully around the world as time goes on as well. So far, we've been able to influence about 13 million credit score points across the industry. So very exciting. So my hours daily. So I work somewhere around 40 hours, sometimes 50 hours a week, almost always here in the office. I can work from home and I travel fairly often. I'd say about 25% of my time is travel and my core focuses with these teams are often. First, I talked about the innovation piece. What is that we're gonna do? And how's it going to actually solve a business need or a need for the individuals and the consumers in the finance space in general, then distilling that into something that we can deploy? Data science is is still a very difficult thing to get into production in the world because many of the paradigms don't necessarily match up nicely to API development, for instance, and so we focus a lot of our court time in that productionizing of models. And then, finally, it's fine tuning models, providing additional interpretation or validation of those models and finding ways that we can going and make sure that there is accurate. It's possible we are a consumer reporting agency. And so the data and the results, the attributes, the features and flows that we create all need to be as accurate as possible. Or we could negatively impact somebody's financial life, their career, their credit in many other aspects of their lives as well. So yeah,
So we use Python as a core technology. We do occasionally use R as well, and we started getting more into Java script as a team. And so I really prefer javascript as a tool. JavaScript is much closer to the native types of things that you would see in the browser such as JSON or Binary JSON, BSON or Jason B, and then from a full technology stack, we are a native US customer. And so, almost everything that we do is cloud-native, so we work with the Kubernetes stack quite a bit with Docker and other containerization type technologies and tools. We work with a lot of queuing tools or a lot of tools for subscribing in publishing to core technologies, and we're working with Cube flow now to help with that productionizing of our models, you're able to help go in and deploy everything in a key native, in a Kubernetes native type world, so that we can get in, have those models in and running as quickly as possible on go through the process. Not every wish JavaScript had the rich technologies that python does, and so a lot of times we are taking the core python code that we used to, get through our modeling, get through our interpretation and validation, get through before process and then report the actual model. After we've persisted it or saved it. We poured everything over into Java script or into another technology that will help us launch that quickly. Oftentimes, we do still use flask or other core technologies within Python to be able to create those APIs, but more often than on it's pushing it to Java script from a model perspective or even on the frameworks side, we do focus a lot in the TensorFlow ecosystem. We do occasionally work with others like PyTorch or paddle or other things that can help with specific algorithms or have different types of advantages, or sometimes working directly with just curious or caress and then with actual models. More often than not, we're focusing in the tensor to tensor libraries. So within Google, with intensive flow, Google has released a lot of open source code for models like Floyd or BIRD. So BIRD is a bi-directional algorithm that has both BIRD and Mini BIRD that allows us to apply a lot of directing coding or embedding on, then be able to use that for large representational transformations so that we can do large attention networks or other things as well on those same lines. When it's appropriate and we don't need quite as much speed, we'll use, other states of the art technologies like Excel Net. It's extra-large. It's very, very large, very powerful, but it's not necessarily very fast. And without something like you floor, whatever routes, it could be pretty. It's not necessarily pragmatic to use in the ecosystem. So along those lines, I'd like to step back and say, like, Why do we or go to the why question? So with a why for us, of course, we're always looking for accuracy and for being as correct and as close to the truth as we possibly can with any other models in our algorithms. We work to really take anything that is still Castaic or is random and try to push that through to being as deterministic or as true and as 1 to 1 relationship as we possibly can or decreasing the question around certainty and entropy. We really want to minimize. It is partly as much as possible so that we're not concerned about whether there is, you know, a bunch of things that we just can't explain or a bunch of things that are way too probabilistic. This is taking a certain quantum of a thing. And that's where a lot of quantum mechanics starts playing into the world of data and data science today. So once we've determined that something is as certain as we need it in order to answer a question, then it becomes a question of actually creating the value that the end-user and businesses going to use. So for Finicity, where business to business company but more often than not are and clients work directly with a consumer or an individual, And so because of that, they need things to be as close to real-time or near real-time as they can on so often, times are questions. Once we've determined how to answer a question, now have to be phrased in the terms of Can we do it in a time period? That actually matters, because if we can't provide the answer when it matters, and it doesn't really matter if the answer is correct or not. And so we have to reframe it often have to choose different models because of that. And then we have to figure out what is still the best thing that we can possibly provide while still getting it down in a time frame that will help somebody get access to a mortgage, will help somebody could access to a car loan or whatever else it is to help them be successful in their own financial lives.I'd say about 20 to 30% of our modeling is not a deep learning, very heavy black box style model. And we used those on to, well, really free potential areas so that the core two areas that we use them are our first prepping something. We're having something that could be done very quickly and that we need an answer on. But it's not an overly complex question. So it's something that we've already narrowed down, and we spend a lot more time on our data engineering and our feature engineering to ensure that those models are performing accurately, and then they're providing the results very quickly. And so oftentimes we really have to shrink down the data set to make sure that they're going to work under that scenario. Then, on the other side of the question, or the other side that we typically focus on is taking the results of the deep learning models or of something within that entire workflow when we're doing multiple ensembles and then we take something like an extra boost, or or we will get into like an SPM or even just other types of core regressions, for instance, or other classification algorithms to then really make sure that the outputs that we're receiving from the neural networks or from a deep learning model are still correct. And they're accurate. So we almost used them just to predict the accuracy and to try to understand or provide further interpretability or really annotations or other things of the models themselves. Then the third use case is where we do use it and end on these air typically for things or situations where we do need to have a really strong explainability. And we need to be able to provide coefficients that we know we can map directly back to the actual data that was used in the model and. We use it from the beginning all the way through the end of the process. But, you know, those 1st 2 probably makeup, you know, 20% maybe 25% of what we do. And the last one only makes up 12 maybe 5% of what we do here.