Code LLMs for GitHub, Google, AlphaCode, and Google: What is a code generator? A Meta post on code writing, testing, and violating the copyright law
Meta says Code Llama did better than public available LLMs based on benchmarking, but it did not specify which models it tested against. The company said Code Llama scored 53.7 percent on the code benchmark and was able to write code that was based on the text description.
“Programmers are already using LLMs to assist in a variety of tasks, ranging from writing new software to debugging existing code,” Meta said in a blog post. “The goal is to make developer workflows more efficient so they can focus on the most human-centric aspects of their jobs.”
Code generators help developers work. GitHub launched Copilot in March, powered by OpenAI’s GPT-4, to quickly write and check code. Old code can be rewritten by the Copilot. Amazon’s AWS also has CodeWhisperer, which also writes, checks, and updates code. And yes, Google also has a code-writing tool in AlphaCode, but that isn’t out yet.
GitHub’s parent company, Microsoft, and OpenAI are being sued for allegedly violating copyright law with Copilot because the tool can reproduce licensed code.
Copilot: An Open-Source Plug-in for Automated Multi-Layer Programming on GitHub’s Collaborative AI Model Llama
The weights of the neural network are being released to the community in an exciting way, according to Deepak Kumar, a researcher focused on artificial intelligence.
Kumar says the release of Meta’s regular language model Llama 2 led to the formation of communities dedicated to discussing how it behaves and how it can be modified. It gives us a little bit more flexibility when it comes to how we play with what’s going on under the hood, compared to closed source models such as OpenAI.
Kumar says developers are likely to build new kinds of applications using Code Llama. For example, it could be possible to create a programming assistant that performs various additional safety checks before recommending a chunk of code, says Kumar, whose own research has explored how AI assistance can sometimes lead to less secure code. Kumar adds that the release could inspire the creation of assistants specialized for particular kinds of coding. He says that you could build all sorts of tooling on top of the model.
In May 2021, GitHub, a subsidiary of Microsoft, launched Copilot, a plug-in for coding programs that auto-completes sections of code based on the first line or a comment typed by the user. Copilot uses a version of Open AI’s GPT, the large language model behind ChatGPT. That model is trained further using code that GitHub stores for developers, as well as, reportedly, by contractors who are paid to annotate their own code.
GitHub is at risk of being sued for using some open source code in its training data, and Masad believes Meta limited the training data to avoid such issues. Copilot costs $10 per month for individuals and $19 per month, per user, for businesses.