Code Llama 70B is now available in Amazon SageMaker JumpStart.

Today we are excited to announce that customers can now deploy Code Llama-based models developed by Meta to run inference with a single click through Amazon SageMaker JumpStart. Code Llama is a state-of-the-art large language model (LLM) that can generate code and natural language for code from both code and natural language prompts. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions to get you started with ML quickly. This post walks you through retrieving and deploying a Code Llama model through SageMaker JumpStart.

code llama

Code Llama is a model released by Meta that builds on Llama 2. This cutting-edge model is designed to improve the productivity of programming tasks by helping developers produce high-quality, well-documented code. This model excels in Python, C++, Java, PHP, C#, TypeScript, and Bash, and has the potential to save developers time and make software workflows more efficient.

It is available in three variants designed to cover a variety of applications: a base model (Code Llama), a Python-specific model (Code Llama Python), and an instruction-following model for understanding natural language instructions (Code Llama Instruct). All Code Llama variants are available in four sizes: 7B, 13B, 34B and 70B parameters. The 7B and 13B base and command variants support populating based on surrounding content, making them ideal for code-enabled applications. The model was designed using Llama 2 as a base and trained on 500 billion code data tokens, and a Python specialized version was trained on an incremental 100 billion tokens. The Code Llama model provides stable generation with up to 100,000 context tokens. All models are trained on 16,000 token sequences and show improvement for inputs with up to 100,000 tokens.

This model is available under the same community license as Llama 2.

Basic model in SageMaker

SageMaker JumpStart provides access to a variety of models from popular model hubs, including Hugging Face, PyTorch Hub, and TensorFlow Hub, for use within your ML development workflow in SageMaker. Recent advances in ML have led to the emergence of new classes of models, including: basic modelis typically trained on billions of parameters and can be applied to a wide range of use cases, such as text summarization, digital art creation, and language translation. These models are expensive to train, so rather than training these models directly, customers want to use existing pre-trained baseline models and fine-tune them as needed. SageMaker provides a curated list of models to choose from in the SageMaker console.

Within SageMaker JumpStart, you can find foundational models from a variety of model providers, allowing you to get started quickly with foundational models. You can find base models based on different tasks or model providers, and easily review model characteristics and terms of use. You can also try out these models using the Test UI widget. If you want to use large-scale models, you can do so using notebooks pre-built by model providers without leaving SageMaker. Because our models are hosted and deployed on AWS, you can be assured that the data used to evaluate or use your models at scale will never be shared with any third parties.

Explore the Code Llama model in SageMaker JumpStart

To deploy the Code Llama 70B model, complete the following steps in Amazon SageMaker Studio:

On the SageMaker Studio home page, choose: jump start In the navigation pane.
Search for Code Llama models and select the Code Llama 70B model from the list of models displayed.

You can find detailed information about the model in the Code Llama 70B model card.

The following screenshot shows the endpoint settings. You can change the options or use the default options.
Agree to the End User License Agreement (EULA) and select Next. distribution.

This starts the endpoint deployment process as shown in the following screenshot.

Deploy a model using the SageMaker Python SDK

Alternatively, you can deploy through an example notebook by selecting: open laptop Within the model details page in Classic Studio. Example notebooks provide end-to-end instructions on how to deploy models and organize resources for inference.

To deploy using a notebook, first select the appropriate model specified by: model_id. You can deploy the selected model in SageMaker using the following code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id="meta-textgeneration-llama-codellama-70b")
predictor = model.deploy(accept_eula=False)  # Change EULA acceptance to True

This will deploy your model to SageMaker with default configuration, including default instance type and default VPC configuration. You can change these configurations by specifying non-default values in JumpStartModel. Basically accept_eula is set to False. must be set accept_eula=True Successfully deploys the endpoint. By doing so, you agree to the aforementioned User License Agreement and Acceptable Use Policy. You can also download the license agreement.

Calling a SageMaker endpoint

After the endpoint is deployed, you can perform inference using Boto3 or the SageMaker Python SDK. The following code uses the SageMaker Python SDK to call a model for inference and print the response.

def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generated_text']}")
    print("\n==================================\n")

function print_response It takes a payload consisting of a payload and a model response and prints the output. Code Llama supports various parameters while performing inference.

maximum length – The model generates text until the output length (including the input context length) is reached. max_length. If specified, must be a positive integer.
max_new_tokens – The model generates text until the output length (excluding the input context length) is reached. max_new_tokens. If specified, must be a positive integer.
num_beams – Specifies the number of beams used for greedy search. If specified, it must be an integer greater than or equal to: num_return_sequences.
no_repeat_ngram_size – The model guarantees the order of the next word. no_repeat_ngram_size No repetition in output order. If specified, must be a positive integer greater than 1.
temperature – Controls the randomness of the output. higher temperature An output sequence containing low probability words is generated. temperature An output sequence containing words with high probability is generated. If the temperature 0 causes greedy decoding. If specified, must be a positive floating point number.
early_stop – If the True,Text generation is complete when all beam hypotheses reach the ,end of the sentence token. If specified, must be Boolean.
do_sample – If the True, the model samples the next word based on likelihood. If specified, must be Boolean.
top_k – At each stage of text generation, the model top_k Most likely word. If specified, must be a positive integer.
top_p – At each step of text generation, the model samples from the smallest possible set of words using cumulative probability. top_p. If specified, it must be a float between 0 and 1.
return_full_text – If the True, the input text becomes part of the output generated text. If specified, must be Boolean. The defaults are: False.
Stop – If specified, must be a list of strings. Text generation stops when any of the specified strings are generated.

You can specify a subset of these parameters while calling an endpoint. Next we show an example of how to call the endpoint using these arguments.

Code completion

The following example shows how to perform code completion when the expected endpoint response is a natural continuation of the prompt.

First run the following code:

prompt = """\
import socket

def ping_exponential_backoff(host: str):\
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following result:

"""
    Pings the given host with exponential backoff.
    """
    timeout = 1
    while True:
        try:
            socket.create_connection((host, 80), timeout=timeout)
            return
        except socket.error:
            timeout *= 2

The following example runs the following code:

prompt = """\
import argparse
def main(string: str):
    print(string)
    print(string[::-1])
if __name__ == "__main__":\
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
predictor.predict(payload)

We get the following result:

parser = argparse.ArgumentParser(description='Reverse a string')
    parser.add_argument('string', type=str, help='String to reverse')
    args = parser.parse_args()
    main(args.string)

Code generation

The following example shows Python code generation using Code Llama.

First run the following code:

prompt = """\
Write a python function to traverse a list in reverse.
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following result:

def reverse(list1):
    for i in range(len(list1)-1,-1,-1):
        print(list1[i])

list1 = [1,2,3,4,5]
reverse(list1)

The following example runs the following code:

prompt = """\
Write a python function to to carry out bubble sort.
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.1, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following result:

def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr

arr = [64, 34, 25, 12, 22, 11, 90]
print(bubble_sort(arr))

Here’s an example of code-related work using Code Llama 70B: You can use models to generate much more complex code. We encourage you to try it out using your own code-specific use cases and examples!

organize

After testing your endpoints, you should delete your SageMaker inference endpoints and models to avoid incurring charges. Use the following code:

predictor.delete_endpoint()

conclusion

This post introduced Code Llama 70B in SageMaker JumpStart. Code Llama 70B is a state-of-the-art model for generating code not only from code, but also from natural language prompts. In SageMaker JumpStart, you can deploy your model in a few simple steps and then use it to perform code-related tasks, such as code generation and code population. As a next step, try using the model with your own code-related use cases and data.

About the author

doctor. Kyle Ulrich I am an application scientist on the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian nonparametric methods, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

AS Farouk Sabir He is a Senior Solutions Architect and expert in artificial intelligence and machine learning at AWS. He holds a doctorate and master’s degree in electrical engineering from the University of Texas at Austin and a master’s degree in computer science from Georgia Institute of Technology. He has over 15 years of work experience and enjoys teaching and mentoring college students. At AWS, he helps customers formulate and solve business problems in the areas of data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and take long road trips.

Junwon I am a product manager for SageMaker JumpStart. He focuses on making foundational models easy to discover and use so customers can build generative AI applications. His experience at Amazon also includes mobile shopping applications and last mile delivery.

What is a network engineer?

MathPrompt: A new AI method for evading AI safety mechanisms through mathematical encoding

Study: AI Could Lead to Inconsistent Results in Home Surveillance | MIT News

GoM on GST rate rationalization will meet on September 25 to discuss slabs, rate adjustments.

DJI OSMO Action 5 Pro camera features new 40-megapixel sensor and longer battery life

What is a network engineer?

Space Marine 2, Opens the Xbox 360 Era, Brothers Enthusiasts in Steam Reviews

Texas Court Dismisses Consensys Lawsuit Against SEC Regarding Ethereum Investigation

Wheels of Change: Self-Balancing Technologies for Urban Mobility

Discord CEO sheds light on future of gamer communication as users cross 200M

Most Popular

OG Power Rangers actor releases t-shirt with Hitler quote

JTC Network’s groundbreaking legal-Recourse Bitcoin fork is listed on WhiteBIT, connecting digital assets with the official court system.

How Will Ai Change the Financial Industry? (Roundtable Interview)

Our Picks

GoM on GST rate rationalization will meet on September 25 to discuss slabs, rate adjustments.

DJI OSMO Action 5 Pro camera features new 40-megapixel sensor and longer battery life

What is a network engineer?

Code Llama 70B is now available in Amazon SageMaker JumpStart.

code llama

Basic model in SageMaker

Explore the Code Llama model in SageMaker JumpStart

Deploy a model using the SageMaker Python SDK

Calling a SageMaker endpoint

Code completion

Code generation

organize

conclusion

About the author

Related Posts