Join leaders in Boston on March 27th for an evening of networking, insight, and conversation. Request an invitation here.
Cognition, a recently formed AI startup backed by Peter Thiel’s Founders Fund and tech industry leaders including former Twitter executive Elad Gil and Doordash co-founder Tony Xu, has announced a fully autonomous AI software engineer named “Devin.”
There are several coding assistants out there, including the popular Github Copilot, but Devin says he stands out from the crowd for his ability to handle entire development projects from start to finish, from writing code and fixing related bugs. Until final execution. This is the first product of its kind, and the startup demonstrated that it can even handle projects on Upwork.
Devin’s announcement marks a significant shift in the AI-assisted development space. This means giving your engineers full-fledged AI workers for their projects, rather than just co-pilots who can simply write basic code or suggest snippets.
But for now, Devin remains private, and the company is only opening access to a select few clients, including Bloomberg journalist Ashlee Vance, who wrote about her experience here.
VB events
AI Impact Tour – Boston
invitation request
What exactly can Devin do?
In a blog post today on the Cognition website, Scott Wu, founder and CEO of Cognition and award-winning sports coder, explains that Devin has access to common developer tools, including his own shell, code editor and browser, within a sandbox computing environment. I did. Plan and execute complex engineering tasks that require thousands of decisions.
Human users input natural language prompts into Devin’s chatbot-style interface, and AI software engineers take them and develop detailed step-by-step plans to solve the problem. You then use developer tools to start projects, write your own code, fix problems, and test and report progress in real time, just like humans use developer tools, so you can keep an eye on everything while you work. You can. .
If something looks odd to a human observer, users can even jump into the chat interface and command the AI to solve the problem. This allows engineering teams to delegate some projects to AI and focus on more creative tasks that require human intelligence, Cognition says.
In this way, Devin presents a new paradigm that offers a glimpse into how all software development and computer tasks in general could be performed in the near future: by AI workers supervised by human supervisors/users.
Can handle a wide range of development tasks
According to the demo shared by Wu, Devin can handle a variety of tasks in its current form. This includes typical engineering projects such as deploying and improving an end-to-end app/website, finding and fixing bugs in the codebase to more complex tasks such as setting up or training fine-tuning on a large language model using the research repository link on GitHub. Included. How to use unfamiliar technology.
In one case, I learned from a blog post how to run code to generate an image with a hidden message. Meanwhile, on the other hand, I was handling an Upwork project where I wrote and debugged the code to run a computer vision model.
In a SWE bench test challenging an AI assistant with a GitHub problem from a real-world open source project, the AI software engineer was able to correctly solve 13.86% of all cases without human assistance. In comparison, Claude 2 was able to solve only 4.80% of the problems, while SWE-Llama-13b and GPT-4 were able to handle 3.97% and 1.74% of the problems, respectively. All of these models needed help telling them which files needed to be modified.
Core technology not yet explained
AI in software development is not a new achievement. From the popular GitHub Copilot and StarCoder to Replit, which has several small AI coding models in Hugging Face, to Codeium, which recently raised Series B funding at a $65 million valuation, there have been tools in this space for quite some time. 500 million dollars.
However, most of these products have primarily focused on using AI to assist with coding. Accelerate your team’s workflow by generating basic code from a text prompt and summarizing or searching for snippets in relevant IDE context. With Devin, Cognition AI goes one step (or several steps) further, allowing full-fledged AI workers to handle your entire project.
While the tool still needs to be tested, its ability to handle multiple steps to complete a software engineering project while staying on track is its biggest unique selling point. Cognition did not disclose exactly how it achieved this feat, or whether it used its own proprietary model or a third-party model, but did note that the work was the result of “advancements in long-term reasoning and planning.”
The company is currently in the process of increasing capacity and offering early access to Devin to a limited number of users. He says anyone interested in stepping up their engineering efforts can get access by contacting them via email. Wider access is expected to become possible in later stages.
Cognition also notes on its website that coding is “just the beginning.” This seems to indicate that inference advances can be leveraged to roll out similar AI agents/workers in other fields as well. The company has received $21 million in funding to date.
VentureBeat’s Mission To be a digital town square where technology decision-makers can gain knowledge and trade in innovative enterprise technologies. Take a look at the briefing.