imported goods
Start by installing and importing the required Python libraries.
!pip install llama-index
!pip install llama-index-embeddings-huggingface
!pip install peft
!pip install auto-gptq
!pip install optimum
!pip install bitsandbytes
# if not running on Colab ensure transformers is installed too
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
Knowledge Base Settings
You can construct a knowledge base by defining the embedding model, chunk size, and chunk overlap. Here we use BAAI’s ~33M parameter bge-small-en-v1.5 embedding model, available on the Hugging Face hub. Different embedding model options are available in this text embedding leaderboard.
# import any embedding model on HF hub
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")Settings.llm = None # we won't use LlamaIndex to set up LLM
Settings.chunk_size = 256
Settings.chunk_overlap = 25
Next, load the source document. There is a folder here called ‘.article,” which includes PDF versions of three Medium articles I wrote for Fat Tails. If you’re running on Colab, you’ll need to download the article folder from your GitHub repository and manually upload it to your Colab environment.
For each file in this folder, the function below reads the text from the PDF, splits it into chunks (based on the settings defined earlier), and stores each chunk in a list like this: document.
documents = SimpleDirectoryReader("articles").load_data()
Because the blog was downloaded directly from Medium as a PDF, it looks more like a webpage than a well-organized article. Therefore, some chunks may contain text unrelated to the article, such as webpage headers and Medium article recommendations.
In the code block below, we’ve trimmed chunks of the document, removing most of the chunks before or after the core of the article.
print(len(documents)) # prints: 71
for doc in documents:
if "Member-only story" in doc.text:
documents.remove(doc)
continueif "The Data Entrepreneurs" in doc.text:
documents.remove(doc)
if " min read" in doc.text:
documents.remove(doc)
print(len(documents)) # prints: 61
Finally, the refined chunks can be stored in a vector database.
index = VectorStoreIndex.from_documents(documents)
Retriever settings
Once you have your knowledge base in place, you can create a finder using LlamaIndex. vector index retriever(), Returns the top 3 chunks that are most similar to the user’s query.
# set number of docs to retreive
top_k = 3# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=top_k,
)
Next, we define a query engine that uses a browser and a query to return a set of related chunks.
# assemble query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)],
)
Use query engine
Now that we have our knowledge base and search system set up, let’s use it to return chunks relevant to our query. Here I’ll ask the same technical questions I asked ShawGPT (a YouTube comment responder) in a previous article.
query = "What is fat-tailedness?"
response = query_engine.query(query)
The query engine returns a response object containing the text, metadata, and index of the relevant chunk. The code block below returns a more readable version of this information.
# reformat response
context = "Context:\n"
for i in range(top_k):
context = context + response.source_nodes[i].text + "\n\n"print(context)
Context:
Some of the controversy might be explained by the observation that log-
normal distributions behave like Gaussian for low sigma and like Power Law
at high sigma [2].
However, to avoid controversy, we can depart (for now) from whether some
given data fits a Power Law or not and focus instead on fat tails.
Fat-tailedness — measuring the space between Mediocristan
and Extremistan
Fat Tails are a more general idea than Pareto and Power Law distributions.
One way we can think about it is that “fat-tailedness” is the degree to which
rare events drive the aggregate statistics of a distribution. From this point of
view, fat-tailedness lives on a spectrum from not fat-tailed (i.e. a Gaussian) to
very fat-tailed (i.e. Pareto 80 – 20).
This maps directly to the idea of Mediocristan vs Extremistan discussed
earlier. The image below visualizes different distributions across this
conceptual landscape [2].print("mean kappa_1n = " + str(np.mean(kappa_dict[filename])))
print("")
Mean κ (1,100) values from 1000 runs for each dataset. Image by author.
These more stable results indicate Medium followers are the most fat-tailed,
followed by LinkedIn Impressions and YouTube earnings.
Note: One can compare these values to Table III in ref [3] to better understand each
κ value. Namely, these values are comparable to a Pareto distribution with α
between 2 and 3.
Although each heuristic told a slightly different story, all signs point toward
Medium followers gained being the most fat-tailed of the 3 datasets.
Conclusion
While binary labeling data as fat-tailed (or not) may be tempting, fat-
tailedness lives on a spectrum. Here, we broke down 4 heuristics for
quantifying how fat-tailed data are.
Pareto, Power Laws, and Fat Tails
What they don’t teach you in statistics
towardsdatascience.com
Although Pareto (and more generally power law) distributions give us a
salient example of fat tails, this is a more general notion that lives on a
spectrum ranging from thin-tailed (i.e. a Gaussian) to very fat-tailed (i.e.
Pareto 80 – 20).
The spectrum of Fat-tailedness. Image by author.
This view of fat-tailedness provides us with a more flexible and precise way of
categorizing data than simply labeling it as a Power Law (or not). However,
this begs the question: how do we define fat-tailedness?
4 Ways to Quantify Fat Tails
Add RAG to LLM
Start by downloading the fine-tuned model from the Hugging Face hub.
# load fine-tuned model from hub
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name,
device_map="auto",
trust_remote_code=False,
revision="main")
config = PeftConfig.from_pretrained("shawhin/shawgpt-ft")
model = PeftModel.from_pretrained(model, "shawhin/shawgpt-ft")
# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
Essentially, you can see how the model responds to technical questions without the context of the article. To do this, we create a prompt template using a lambda function that takes viewer comments and returns a prompt for LLM. For more information about where this message came from, see previous articles in this series.
# prompt (no context)
intstructions_string = f"""ShawGPT, functioning as a virtual data science \
consultant on YouTube, communicates in clear, accessible language, escalating \
to technical depth upon request. It reacts to feedback aptly and ends \
responses with its signature '–ShawGPT'.ShawGPT will tailor the length of its responses to match the viewer's comment, \
providing concise acknowledgments to brief expressions of gratitude or \
feedback, thus keeping the interaction natural and engaging.
Please respond to the following comment.
"""
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment} \n[/INST]'''
comment = "What is fat-tailedness?"
prompt = prompt_template(comment)
print(prompt)
[INST] ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, thus keeping the interaction natural and engaging.Please respond to the following comment.
What is fat-tailedness?
[/INST]
Next, you can pass this prompt to your model using the code below.
model.eval()inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"),
max_new_tokens=280)
print(tokenizer.batch_decode(outputs)[0])
The model’s response is (no context):
Great question!Fat-tailedness is a statistical property of a distribution. In simple terms,
it refers to the presence of extreme outliers or heavy tails in the
distribution.
For instance, consider the distribution of heights in a population. A normal
distribution would have most people clustered around an average height with a
few people deviating slightly from the mean. However, in a fat-tailed
distribution, you would observe a larger number of people being significantly
taller or shorter than the average.
In finance, fat-tailedness is often used to describe the presence of extreme
market moves or outliers in financial returns. It's an important concept in
risk management and financial modeling.
I hope this explanation helps! Let me know if you have any questions.
–ShawGPT
The style and format of the response is good, but the model’s description of the fat tail is different from how I defined it in the video and blog series.
Let’s see what happens to the model’s response when we include the appropriate context. To do this, we create another prompt template that can get the context of the RAG system.
# prompt (with context)
prompt_template_w_context = lambda context, comment: f"""[INST]ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. \
ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.{context}
Please respond to the following comment. Use the context above if it is helpful.
{comment}
[/INST]
"""
Next, we pass a prompt containing the query engine’s context and view annotations to the model.
prompt = prompt_template_w_context(context, comment)inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)
print(tokenizer.batch_decode(outputs)[0])
Here is the new response (with context):
Great question!Fat-tailedness is a property of a distribution that describes how much the
presence of extreme values (rare events) affects the overall statistics of the
distribution.
In the context of the video, the author discusses how Mediocristan and
Extremistan distributions differ in terms of fat-tailedness. Mediocristan
distributions have a relatively small impact from extreme values, while
Extremistan distributions are heavily influenced by extreme values.
The author also mentions that fat-tailedness is a more general concept than
specific distributions like Pareto or Power Laws. Instead, it lives on a
spectrum ranging from thin-tailed (Gaussian) to very fat-tailed (Pareto 80-20).
I hope that helps clarify things a bit! Let me know if you have any questions.
–ShawGPT
This captures the explanation of the fat tail much better than the context-free response and even evokes the niche concepts of Mediocristan and Extremistan.
Here I introduced a beginner-friendly RAG and shared a specific example of how to implement a RAG using LlamaIndex. RAGs allow you to enhance your LLM system with updatable domain-specific knowledge.
While most of the recent AI hype has focused on building AI assistants, a powerful (yet less popular) innovation has come from text embeddings (i.e. the ones we used for search). In the next article in this series Text Embedding More details, including how to use it search for meaning and classification task.
More about LLMs 👇