Today we are excited to announce that customers can now deploy Code Llama-based models developed by Meta to run inference with a single click through Amazon SageMaker JumpStart. Code Llama is a state-of-the-art large language model (LLM) that can generate code and natural language for code from both code and natural language prompts. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions to get you started with ML quickly. This post walks you through retrieving and deploying a Code Llama model through SageMaker JumpStart.
code llama
Code Llama is a model released by Meta that builds on Llama 2. This cutting-edge model is designed to improve the productivity of programming tasks by helping developers produce high-quality, well-documented code. This model excels in Python, C++, Java, PHP, C#, TypeScript, and Bash, and has the potential to save developers time and make software workflows more efficient.
It is available in three variants designed to cover a variety of applications: a base model (Code Llama), a Python-specific model (Code Llama Python), and an instruction-following model for understanding natural language instructions (Code Llama Instruct). All Code Llama variants are available in four sizes: 7B, 13B, 34B and 70B parameters. The 7B and 13B base and command variants support populating based on surrounding content, making them ideal for code-enabled applications. The model was designed using Llama 2 as a base and trained on 500 billion code data tokens, and a Python specialized version was trained on an incremental 100 billion tokens. The Code Llama model provides stable generation with up to 100,000 context tokens. All models are trained on 16,000 token sequences and show improvement for inputs with up to 100,000 tokens.
This model is available under the same community license as Llama 2.
Basic model in SageMaker
SageMaker JumpStart provides access to a variety of models from popular model hubs, including Hugging Face, PyTorch Hub, and TensorFlow Hub, for use within your ML development workflow in SageMaker. Recent advances in ML have led to the emergence of new classes of models, including: basic modelis typically trained on billions of parameters and can be applied to a wide range of use cases, such as text summarization, digital art creation, and language translation. These models are expensive to train, so rather than training these models directly, customers want to use existing pre-trained baseline models and fine-tune them as needed. SageMaker provides a curated list of models to choose from in the SageMaker console.
Within SageMaker JumpStart, you can find foundational models from a variety of model providers, allowing you to get started quickly with foundational models. You can find base models based on different tasks or model providers, and easily review model characteristics and terms of use. You can also try out these models using the Test UI widget. If you want to use large-scale models, you can do so using notebooks pre-built by model providers without leaving SageMaker. Because our models are hosted and deployed on AWS, you can be assured that the data used to evaluate or use your models at scale will never be shared with any third parties.
Explore the Code Llama model in SageMaker JumpStart
To deploy the Code Llama 70B model, complete the following steps in Amazon SageMaker Studio:
- On the SageMaker Studio home page, choose: jump start In the navigation pane.
- Search for Code Llama models and select the Code Llama 70B model from the list of models displayed.
You can find detailed information about the model in the Code Llama 70B model card.
The following screenshot shows the endpoint settings. You can change the options or use the default options.
- Agree to the End User License Agreement (EULA) and select Next. distribution.
This starts the endpoint deployment process as shown in the following screenshot.
Deploy a model using the SageMaker Python SDK
Alternatively, you can deploy through an example notebook by selecting: open laptop Within the model details page in Classic Studio. Example notebooks provide end-to-end instructions on how to deploy models and organize resources for inference.
To deploy using a notebook, first select the appropriate model specified by: model_id
. You can deploy the selected model in SageMaker using the following code:
This will deploy your model to SageMaker with default configuration, including default instance type and default VPC configuration. You can change these configurations by specifying non-default values ​​in JumpStartModel. Basically accept_eula
is set to False
. must be set accept_eula=True
Successfully deploys the endpoint. By doing so, you agree to the aforementioned User License Agreement and Acceptable Use Policy. You can also download the license agreement.
Calling a SageMaker endpoint
After the endpoint is deployed, you can perform inference using Boto3 or the SageMaker Python SDK. The following code uses the SageMaker Python SDK to call a model for inference and print the response.
function print_response
It takes a payload consisting of a payload and a model response and prints the output. Code Llama supports various parameters while performing inference.
- maximum length – The model generates text until the output length (including the input context length) is reached.
max_length
. If specified, must be a positive integer. - max_new_tokens – The model generates text until the output length (excluding the input context length) is reached.
max_new_tokens
. If specified, must be a positive integer. - num_beams – Specifies the number of beams used for greedy search. If specified, it must be an integer greater than or equal to:
num_return_sequences
. - no_repeat_ngram_size – The model guarantees the order of the next word.
no_repeat_ngram_size
No repetition in output order. If specified, must be a positive integer greater than 1. - temperature – Controls the randomness of the output. higher
temperature
An output sequence containing low probability words is generated.temperature
An output sequence containing words with high probability is generated. If thetemperature
0 causes greedy decoding. If specified, must be a positive floating point number. - early_stop – If the
True
,Text generation is complete when all beam hypotheses reach the ,end of the sentence token. If specified, must be Boolean. - do_sample – If the
True
, the model samples the next word based on likelihood. If specified, must be Boolean. - top_k – At each stage of text generation, the model
top_k
Most likely word. If specified, must be a positive integer. - top_p – At each step of text generation, the model samples from the smallest possible set of words using cumulative probability.
top_p
. If specified, it must be a float between 0 and 1. - return_full_text – If the
True
, the input text becomes part of the output generated text. If specified, must be Boolean. The defaults are:False
. - Stop – If specified, must be a list of strings. Text generation stops when any of the specified strings are generated.
You can specify a subset of these parameters while calling an endpoint. Next we show an example of how to call the endpoint using these arguments.
Code completion
The following example shows how to perform code completion when the expected endpoint response is a natural continuation of the prompt.
First run the following code:
We get the following result:
The following example runs the following code:
We get the following result:
Code generation
The following example shows Python code generation using Code Llama.
First run the following code:
We get the following result:
The following example runs the following code:
We get the following result:
Here’s an example of code-related work using Code Llama 70B: You can use models to generate much more complex code. We encourage you to try it out using your own code-specific use cases and examples!
organize
After testing your endpoints, you should delete your SageMaker inference endpoints and models to avoid incurring charges. Use the following code:
conclusion
This post introduced Code Llama 70B in SageMaker JumpStart. Code Llama 70B is a state-of-the-art model for generating code not only from code, but also from natural language prompts. In SageMaker JumpStart, you can deploy your model in a few simple steps and then use it to perform code-related tasks, such as code generation and code population. As a next step, try using the model with your own code-related use cases and data.
About the author