For the first three years of OpenAI, I dreamed of becoming a machine learning expert but made little progress towards that goal. Over the past nine months, I’ve finally made the transition to being a machine learning practitioner. It was hard but not impossible, and I think most people who are good programmers and know (or are willing to learn) the math can do it too. There are many online courses to self-study the technical side, and what turned out to be my biggest blocker was a mental barrier — getting ok with being a beginner again.
在OpenAI的前三年里,我一直梦想成为一名机器学习专家,但是对于这个目标的进展却微乎其微。在过去的九个月里,我终于成功地转变为一名机器学习实践者。这个过程很艰难,但并非不可能,我认为大多数擅长编程并且知道(或愿意学习)数学的人也可以做到。有许多在线课程可以自学技术方面的知识,而我最大的阻碍竟然是心理障碍——接受自己再次成为初学者的事实。

Studying machine learning during the 2018 holiday season.
在2018年假期期间学习机器学习。

A founding principle of OpenAI is that we value research and engineering equally — our goal is to build working systems that solve previously impossible tasks, so we need both. (In fact, our team is comprised of 25% people primarily using software skills, 25% primarily using machine learning skills, and 50% doing a hybrid of the two.) So from day one of OpenAI, my software skills were always in demand, and I kept procrastinating on picking up the machine learning skills I wanted.
OpenAI的一个创始原则是我们同等重视研究和工程 —— 我们的目标是构建能解决以前无法解决的任务的工作系统,所以我们需要两者。 (实际上,我们的团队由25%主要使用软件技能的人,25%主要使用机器学习技能的人,以及50%的人在做两者的混合。)所以从OpenAI的第一天开始,我的软件技能就一直有需求,我一直在拖延学习我想要的机器学习技能。

After helping build OpenAI Gym, I was called to work on Universe. And as Universe was winding down, we decided to start working on Dota — and we needed someone to turn the game into a reinforcement learning environment before any machine learning could begin.
在帮助构建OpenAI Gym之后,我被召唤去工作在Universe上。当Universe即将结束时,我们决定开始研究Dota —— 在任何机器学习开始之前,我们需要有人将游戏转化为强化学习环境。

Dota #将如此复杂的游戏转化为一个研究环境,而不需要源代码访问,这是一项了不起的工作,每次我克服一个新的难题,团队的兴奋都深深地验证了这一点。我找出了如何从游戏的Lua沙箱中跳出来,LD_PRELOAD在Go GRPC服务器中以编程方式控制游戏,逐步将整个游戏状态转储到Protobuf中,并构建了一个Python库和抽象,以便未来兼容我们可能想要使用的许多不同的多代理配置

Turning such a complex game into a research environment without source code access was awesome work, and the team’s excitement every time I overcame a new obstacle was deeply validating. I figured out how to break out of the game’s Lua sandbox, LD_PRELOAD in a Go GRPC server to programmatically control the game, incrementally dump the whole game state into a Protobuf, and build a Python library and abstractions with future compatibility for the many different multiagent configurations we might want to use.

But I felt half blind. At Stripe, though I gravitated towards infrastructure solutions, I could make changes anywhere in the stack since I knew the product code intimately. In Dota, I was constrained to looking at all problems through a software lens, which sometimes meant I tried to solve hard problems that could be avoided by just doing the machine learning slightly differently.
但我感觉自己像是半个盲人。在Stripe,尽管我倾向于基础设施解决方案,但由于我对产品代码了如指掌,所以我可以在堆栈的任何地方进行更改。在Dota中,我被限制在通过软件视角看待所有问题,这有时意味着我试图解决那些只需稍微改变机器学习方式就可以避免的难题。

I wanted to be like my teammates Jakub Pachocki and Szymon Sidor, who had made the core breakthrough that powered our Dota bot. They had questioned the common wisdom within OpenAI that reinforcement algorithms didn’t scale. They wrote a distributed reinforcement learning framework called Rapid and scaled it exponentially every two weeks or so, and we never hit a wall with it. I wanted to be able to make critical contributions like that which combined software and machine learning skills.
我想成为像我的队友Jakub Pachocki和Szymon Sidor那样的人,他们实现了驱动我们Dota机器人的核心突破。他们质疑了OpenAI内部的常识,即强化算法无法扩展。他们编写了一个名为Rapid的分布式强化学习框架,并每两周左右将其指数级扩展,我们从未遇到过任何障碍。我希望能够像他们那样做出结合软件和机器学习技能的关键贡献。

In July 2017, it looked like I might have my chance. The software infrastructure was stable, and I began work on a machine learning project. My goal was to use behavioral cloning to teach a neural network from human training data. But I wasn’t quite prepared for just how much I would feel like a beginner.
2017年7月,我似乎有了机会。软件基础设施稳定了,我开始了一个机器学习项目。我的目标是使用行为克隆从人类训练数据中教授神经网络。但我并没有完全准备好,我会感觉像个初学者。

I kept being frustrated by small workflow details which made me uncertain if I was making progress, such as not being certain which code a given experiment had used or realizing I needed to compare against a result from last week that I hadn’t properly archived. To make things worse, I kept discovering small bugs that had been corrupting my results the whole time.
我一直被一些小的工作流程细节所困扰,这让我不确定我是否在取得进展,比如不确定某个实验使用了哪段代码,或者意识到我需要与上周的一个结果进行比较,但我并没有妥善地存档。更糟糕的是,我一直在发现一些小错误,这些错误一直在破坏我的结果。

I didn’t feel confident in my work, but to make it worse, other people did. People would mention how how hard behavioral cloning from human data is. I always made sure to correct them by pointing out that I was a newbie, and this probably said more about my abilities than the problem.
我对自己的工作并不自信,但更糟糕的是,其他人却对我抱有期待。人们会提到从人类数据中进行行为克隆有多么困难。我总是会纠正他们,指出我只是个新手,这可能更多地反映了我的能力,而非问题本身。

It all briefly felt worth it when my code made it into the bot, as Jie Tang used it as the starting point for creep blocking which he then fine-tuned with reinforcement learning. But soon Jie figured out how to get better results without using my code, and I had nothing to show for my efforts.
当我的代码被用于机器人,而且被杰唐作为阻挡爬行动作的起点,然后他用强化学习进行了微调,那一刻,我觉得所有的努力都是值得的。但很快,杰就找到了不用我的代码就能得到更好结果的方法,我为我的努力一无所获。

Time out #在我们在2018年的国际比赛中输掉两场比赛后,大多数观察者认为我们的方法已经达到了极限。但我们从我们的数据中知道,我们正处于成功的边缘,主要需要更多的训练。这意味着对我的时间的需求有所减轻,于是在2018年11月,我觉得我有机会用我三个月的时间去冒一次险

After we lost two games in The International in 2018, most observers thought we’d topped out what our approach could do. But we knew from our metrics that we were right on the edge of success and mostly needed more training. This meant the demands on my time had relented, and in November 2018, I felt I had an opening to take a gamble with three months of my time.

Team members in high spirits after losing our first game at The International.
在国际比赛中输掉我们的第一场比赛后,团队成员们的士气高昂。

I learn best when I have something specific in mind to build. I decided to try building a chatbot. I started self-studying the curriculum we developed for our Fellows program, selecting only the NLP-relevant modules. For example, I wrote and trained an LSTM language model and then a Transformer-based one. I also read up on topics like information theory and read many papers, poring over each line until I fully absorbed it.
我在有具体构建目标时学习最好。我决定尝试构建一个聊天机器人。我开始自学我们为Fellows项目开发的课程大纲,只选择与自然语言处理相关的模块。例如,我编写并训练了一个LSTM语言模型,然后是一个基于Transformer的模型。我还研究了信息理论等主题,并阅读了许多论文,逐行深入研究,直到我完全吸收它。

It was slow going, but this time I expected it. I didn’t experience flow state. I was reminded of how I’d felt when I just started programming, and I kept thinking of how many years it had taken to achieve a feeling of mastery. I honestly wasn’t confident that I would ever become good at machine learning. But I kept pushing because... well, honestly because I didn’t want to be constrained to only understanding one part of my projects. I wanted to see the whole picture clearly.
这是一个缓慢的过程,但这次我已经有所预料。我并没有体验到流畅的状态。我想起了我刚开始编程时的感觉,我一直在想,我花了多少年才达到一种掌握的感觉。我真的不确定我是否会擅长机器学习。但我一直在努力,因为......实话说,因为我不想只理解我的项目的一部分。我想清楚地看到整个画面。

My personal life was also an important factor in keeping me going. I’d begun a relationship with someone who made me feel it was ok if I failed. I spent our first holiday season together beating my head against the machine learning wall, but she was there with me no matter how many planned activities it meant skipping.
我的个人生活也是让我坚持下去的重要因素。我开始了一段关系,那个人让我觉得即使我失败了也没关系。我们的第一个假期,我一直在机器学习的问题上磨砺,但无论我跳过多少计划的活动,她都会陪在我身边。

One important conceptual step was overcoming a barrier I’d been too timid to do with Dota: make substantive changes to someone else’s machine learning code. I fine-tuned GPT-1 on chat datasets I’d found, and made a small change to add my own naive sampling code. But it became so painfully slow as I tried to generate longer messages that my frustration overwhelmed my fear, and I implemented GPU caching — a change which touched the entire model.
一个重要的概念性步骤是克服了我在Dota上过于胆小而不敢做的一道障碍:对别人的机器学习代码进行实质性的改动。我在我找到的聊天数据集上对GPT-1进行了微调,并做了一个小改动,添加了我自己的朴素采样代码。但是当我试图生成更长的消息时,它变得痛苦地慢,我的挫败感压倒了我的恐惧,我实现了GPU缓存——这个改变触及了整个模型。

I had to try a few times, throwing out my changes as they exceeded the complexity I could hold in my head. By the time I got it working a few days later, I realized I’d learned something that I would have previously thought impossible: I now understood how the whole model was put together, down to small stylistic details like how the codebase elegantly handles TensorFlow variable scopes.
我不得不尝试几次,当我的改动超过了我能在脑海中容纳的复杂性时,我就把它们扔掉。几天后,当我让它工作起来时,我意识到我学到了一些我以前认为不可能的东西:我现在明白了整个模型是如何组合在一起的,直到像代码库如何优雅地处理TensorFlow变量范围这样的小的风格细节。

After three months of self-study, I felt ready to work on an actual project. This was also the first point where I felt I could benefit from the many experts we have at OpenAI, and I was delighted when Jakub and my co-founder Ilya Sutskever agreed to advise me.
经过三个月的自学,我感觉自己已经准备好开始实际的项目了。这也是我第一次感觉到我可以从OpenAI的许多专家中受益,当Jakub和我的联合创始人Ilya Sutskever同意指导我时,我感到非常高兴。

We started to get very exciting results, and Jakub and Szymon joined the project full-time. I feel proud every time I see a commit from them in the machine learning codebase I’d started.
我们开始得到非常令人兴奋的结果,Jakub和Szymon全职加入了这个项目。每次我看到他们在我开始的机器学习代码库中的提交,我都感到非常自豪。

I’m starting to feel competent, though I haven’t yet achieved mastery. I’m seeing this reflected in the number of hours I can motivate myself to spend focused on doing machine learning work — I’m now around 75% of the number of coding hours from where I’ve been historically.
我开始感到自己有能力了,尽管我还没有达到精通的程度。我看到这一点反映在我能激励自己专注于做机器学习工作的时间上——我现在的编程时间大约是我历史上的75%。

But for the first time, I feel that I’m on trajectory. At first, I was overwhelmed by the seemingly endless stream of new machine learning concepts. Within the first six months, I realized that I could make progress without constantly learning entirely new primitives. I still need to get more experience with many skills, such as initializing a network or setting a learning rate schedule, but now the work feels incremental rather than potentially impossible.
但这是我第一次感觉到自己正在走上正轨。起初,我被看似无穷无尽的新机器学习概念所淹没。在最初的六个月里,我意识到我可以在不断学习全新基础知识的情况下取得进步。我仍然需要获得更多的技能经验,比如初始化一个网络或设置一个学习率计划,但现在的工作感觉是逐步的,而不是可能的不可能。

From our Fellows and Scholars programs, I’d known that software engineers with solid fundamentals in linear algebra and probability can become machine learning engineers with just a few months of self study. But somehow I’d convinced myself that I was the exception and couldn’t learn. But I was wrong — even embedded in the middle of OpenAI, I couldn’t make the transition because I was unwilling to become a beginner again.
从我们的研究员和学者计划中,我知道拥有线性代数和概率基础的软件工程师只需几个月的自学就可以成为机器学习工程师。但不知怎的,我却让自己相信我是个例外,我无法学习。但我错了——即使我深处在OpenAI的中心,我也无法进行转变,因为我不愿意再次成为一个初学者。

You’re probably not an exception either. If you’d like to become a deep learning practitioner, you can. You need to give yourself the space and time to fail. If you learn from enough failures, you’ll succeed — and it’ll probably take much less time than you expect.
你可能也不是例外。如果你想成为一个深度学习实践者,你可以。你需要给自己失败的空间和时间。如果你从足够多的失败中学习,你会成功的——而且这可能比你预期的时间要少得多。

At some point, it does become important to surround yourself by existing experts. And that is one place where I’m incredibly lucky. If you’re a great software engineer who reaches that point, keep in mind there’s a way you can be surrounded by the same people as I am — apply to OpenAI!
在某个时刻,让自己置身于现有的专家之中确实变得重要。这是我非常幸运的一点。如果你是一位出色的软件工程师并且达到了这个阶段,记住有一种方式可以让你和我一样被这些人所包围——申请OpenAI!