有限计算资源如何塑造机器学习优化大师的职业之路
Watching Iron Man as a college student in his hometown of Tijuana, Mexico, Carlos Huertas was struck by one character in particular: J.A.R.V.I.S., the butler-like artificial assistant embedded in Tony Stark’s armor.Even though it was only a movie, Huertas knew it foreshadowed real-life potential.“I was fascinated by that level of technology,” he says.At the time, he was pursuing a bachelor’s degree in computer engineering at the Universidad Autónoma de Baja California. Inspired by J.A.R.V.I.S.’s impressive communication skills, Huertas decided to pursue a master’s in natural language processing (NLP) at the same university.That early shift to artificial intelligence ultimately brought him to a major technology center, where he is a manager of machine learning on the Buyer Risk Prevention Team in Seattle, which is responsible for protecting customers from fraud and abuse.
在墨西哥蒂华纳读大学时,Carlos Huertas观看了电影《钢铁侠》,其中一个角色给他留下了深刻印象:J.A.R.V.I.S.,那个嵌入托尼·斯塔克装甲中、像管家一样的人工智能助手。尽管这只是一部电影,但Huertas知道它预示着现实中的潜力。“我对那种水平的技术着迷了,”他说。当时,他正在下加利福尼亚自治大学攻读计算机工程学士学位。受到J.A.R.V.I.S.令人印象深刻的沟通能力的启发,Huertas决定在同一所大学攻读自然语言处理的硕士学位。早期向人工智能领域的转变最终将他带到了某中心,他现在是西雅图买家风险防范团队的机器学习经理,该团队负责保护客户免受欺诈和滥用。
Doing more with less
The master’s program was challenging, as NLP requires a lot of hardware horsepower that wasn’t available to Huertas at the time.
“Back then, you needed huge machines ... We were a very humble facility and had regular consumer computers, so it was hard for me to try to match what people were doing with more resources.”
Carlos Huertas
“Back then, you needed huge machines to achieve interesting things, which I didn’t have,” he says. “We were a very humble facility and had regular consumer computers, so it was hard for me to try to match what people were doing with more resources.”The limited computing resources forced him to think outside of the box and develop creative solutions to do more with less. The challenge energized him, and for his PhD, he turned to the field of machine learning optimization, specifically feature selection for high-dimensional spaces.That area of machine learning involves designing algorithms that help a machine to focus solely on features that are relevant to a specific task. One example where feature selection may be used is the “cat vs dog” image classification task, a classic machine learning project for beginners that involves classifying photos as containing either a dog or a cat.Those animals have numerous features, such as color, height, weight, tail, nose shape, and eye color. Humans use their knowledge of the world to understand what helps differentiate them. For example, size might be important as most dogs tend to be bigger, but tail might not be very useful, since both animals have it.“How do we make sure a machine learns this on its own? Feature selection is the process to help the computer understand that some of the characteristics are more important than others, so it can focus on what matters most and achieve similar or even better level of performance without so much computing power,” Huertas says.
用更少资源做更多事
硕士课程充满挑战,因为NLP需要大量当时Huertas无法获得的硬件算力。
“那时,你需要巨大的机器……我们的设施非常简陋,只有普通的消费级计算机,所以我很难赶上那些拥有更多资源的人所做的事。”
Carlos Huertas
“那时,你需要巨大的机器才能做出有趣的东西,而我没有这些,”他说。“我们的设施非常简陋,只有普通的消费级计算机,所以我很难赶上那些拥有更多资源的人所做的事。”有限的计算资源迫使他跳出思维定式,开发创造性的解决方案,以更少的资源做更多的事。这个挑战激励了他,在攻读博士学位时,他转向了机器学习优化领域,特别是高维空间的特征选择。机器学习的这一领域涉及设计算法,帮助机器只关注与特定任务相关的特征。一个可能使用特征选择的例子是“猫狗分类”图像分类任务,这是一个经典的机器学习入门项目,涉及将照片分类为包含狗或猫。这些动物有许多特征,如颜色、身高、体重、尾巴、鼻子形状和眼睛颜色。人类利用他们对世界的知识来理解哪些特征有助于区分它们。例如,体型可能很重要,因为大多数狗往往更大,但尾巴可能不太有用,因为两种动物都有。“我们如何确保机器能自己学会这一点?特征选择是帮助计算机理解某些特征比其他特征更重要的过程,这样它就可以专注于最重要的东西,并在无需大量计算能力的情况下达到相似甚至更好的性能水平,”Huertas说。
Solving customer problems with machine learning
Huertas routinely applies feature selection in his work at the technology center.
用机器学习解决客户问题
Huertas在技术中心的工作中经常应用特征选择。
买家风险防范团队,Huertas解释道,负责确保商城对客户和销售伙伴安全可信。“本着我们的主要领导原则之一‘客户至上’的精神,我们不断创新,永不停止为所有客户提供最佳体验的努力,”他指出。“为此,我们识别痛点并用技术解决它们。”为了正确服务客户,该中心在2019年创建了一个团队,专注于缓解客户在寻求账户支持时可能遇到的问题;这就是Huertas目前领导的团队。该团队开发机器学习解决方案,帮助客户解决账户问题。“算法将尝试使用人工智能自行审查案例,并为客户确定正确的行动方案,”他说。“这样,我们可以提供更快的支持。”随着中心的发展,数据量和系统复杂性也在增加。在这种背景下,了解哪些特征与判断问题是否合法相关非常重要。“这与特征选择完美契合,我们会问:‘我们能否更聪明,选择我们应该关注的重点,使我们的模型在不受可扩展性问题影响的情况下达到最佳性能?’”他说。Huertas的团队专注于就客户对账户状态的疑虑提供更快、更准确的回应。现在,可能遇到问题的客户可以重新获得访问权限,而无需经历复杂的过程。Huertas想到了他自己的父母,他们是客户,但可能很难使用电子邮件等第三方系统与该平台沟通。
Huertas says his background as an assistant professor at Universidad Autónoma de Baja California, where he taught object-oriented programming and web development, helped shape him into a team player and a leader.“In academia, we have this common phrase that the student doesn't fail, it is the professor who fails,” Huertas says. “When I was a professor, I felt this need to push my students forward. And that's something that I still carry with me on my team. I feel a lot of satisfaction seeing my team members develop.”
Huertas表示,他在下加利福尼亚自治大学担任助理教授的经历,教授面向对象编程和Web开发,帮助他成长为一名团队合作者和领导者。“在学术界,我们常说‘学生不会失败,失败的是教授’,”Huertas说。“当我还是教授时,我感到了推动学生前进的这种需求。这是我现在在团队中仍然秉持的理念。看到团队成员成长,我感到非常满足。”
Discussion grandmaster on Kaggle
Back when Huertas was a PhD student, he joined Kaggle, an online data science and machine learning community. His goal: use the platform to test some of his PhD ideas and see how they fared against real-life problems. Because of his frequent interactions on the platform, where he still serves as a mentor to many of his peers, he holds the title of “discussion grandmaster” and was once one of the five most active users in the forum — among almost 5 million users.
“The community has always been very friendly, and newcomers ask a lot of questions on how to get started,” he says.At Kaggle, companies promote competitions to solve real-life machine-learning problems.“It's especially useful when you're a student, because in academia you won't have access to the type of problems that a major technology center might have. Getting exposure to those problems without the need to have a job there really helps you to develop your skills,” Huertas says.In one of those competitions, when Huertas was still a PhD student, he ended up in the top 9 contestants among thousands of scientists around the world. He was competing with a laptop that, he recalls, “could barely run more than a browser.” The experience taught him a lot about how constraints can be empowering.“It forced me to develop my own packages. And in the process, I learned how things work behind the scenes,” he says. When people have a lot of computing power, he notes, they might forget about the importance of optimization and rely on a lot of pre-built packages that might operate like a black-box.“When you don't understand what is the magic happening behind the scenes, it is very hard to progress beyond that,” he says.His prominence on Kaggle drew interest from a Los Angeles-based company that offers underwriting analysis for lenders. After a stint building machine learning models for them, he joined another company where he helped launch the company’s first customer retention platform by building machine learning models to analyze which customers were more prone to abandon the platform.Shortly after that, recruiters from the technology center reached out and he accepted a position on the Buyer Risk Prevention team.“I like that the center puts a huge emphasis on matching your skills with the role,” Huerta says. “While other companies might have generic roles, like data scientist, the center has very specialized roles, such as applied scientist, research scientist, data engineer, machine learning engineer. That ensures that you're going to focus exactly on what you like.”
Kaggle上的讨论大师
当Huertas还是博士生时,他加入了Kaggle,一个在线数据科学和机器学习社区。他的目标是:利用该平台测试他的一些博士想法,并看看它们如何应对现实问题。由于他在平台上频繁互动(他至今仍为许多同行担任导师),他拥有“讨论大师”的头衔,并且曾是论坛中近500万用户中最活跃的五位用户之一。
“这个社区一直非常友好,新手会问很多关于如何入门的问题,”他说。在Kaggle上,公司会举办竞赛来解决现实生活中的机器学习问题。“这对于学生尤其有用,因为在学术界你无法接触到某中心可能面临的那类问题。无需在那里工作就能接触到这些问题,确实有助于你发展技能,”Huertas说。在其中一场竞赛中,当Huertas还是博士生时,他在全球数千名科学家中最终进入了前9名。他使用的是一台笔记本电脑,他回忆说,“除了浏览器几乎运行不了其他东西。”这段经历让他深刻体会到约束如何能成为动力。“它迫使我开发自己的工具包。在这个过程中,我了解了幕后的工作原理,”他说。他指出,当人们拥有大量计算能力时,他们可能会忘记优化的重要性,并依赖于许多可能像黑匣子一样运行的预构建工具包。“当你不理解幕后发生了什么魔法时,就很难超越那个层面,”他说。他在Kaggle上的突出表现引起了一家位于洛杉矶、为贷款机构提供承销分析服务的公司的兴趣。在为该公司构建机器学习模型一段时间后,他加入了另一家公司,在那里他通过构建机器学习模型来分析哪些客户更容易放弃该平台,从而帮助推出了该公司的首个客户留存平台。此后不久,该技术中心的招聘人员联系了他,他接受了买家风险防范团队的职位。“我喜欢中心非常强调将你的技能与职位匹配,”Huertas说。“虽然其他公司可能有通用的角色,比如数据科学家,但中心有非常专业化的角色,如应用科学家、研究科学家、数据工程师、机器学习工程师。这确保你将专注于你喜欢的事情。”
The advice he provides younger scientists is to always practice what you learn in academia in a real-life setting. He compares it with a sport: You can read several books about soccer, but if you’ve never kicked a ball, it will be very tough to play it.“It is very important that you materialize that theory into practice,” he says. “If you are still doing your PhD, there are platforms like Kaggle that will provide you with data so that you can practice your skills. By the time you complete your studies, you will have two or three years of technical experience in the field, working with real problems. That will take you very far.”
他给年轻科学家的建议是,始终要在现实环境中实践你在学术中学到的东西。他将此与一项运动进行比较:你可以读几本关于足球的书,但如果你从未踢过球,踢球将非常困难。“将理论转化为实践非常重要,”他说。“如果你还在攻读博士学位,有一些像Kaggle这样的平台可以为你提供数据,以便你能练习你的技能。当你完成学业时,你将拥有两到三年在该领域处理实际问题的技术经验。这将对你的发展大有裨益。”
更多精彩内容 请关注我的个人公众号 公众号(办公AI智能小助手)或者 我的个人博客 https://blog.qife122.com/
对网络安全、黑客技术感兴趣的朋友可以关注我的安全公众号(网络安全技术点滴分享)
公众号二维码

公众号二维码


浙公网安备 33010602011771号