LangChain: Powerful framework for Generative AI
LangChain is a framework engineered to facilitate the integration and deployment of Large Language Models (LLM) in applications, unveiling powerful natural language interactions across diverse technological systems. Delve into this article to discover the influence of LangChain in today's digital landscape.
Bài viết này có phiên bản Tiếng Việt
Large Language Model (LLM) are advanced machine learning frameworks that decode and generate human language, having evolved significantly from their simpler predecessors due to breakthroughs in computational prowess. LLM has revolutionized the field of natural language processing (NLP), paving the way for many applications in the tech world, and quickly establishing its place in our digital age. Today, LLMs can be found everywhere from chatbots to content generation systems, even in our daily interactions such as search tools or content suggestions on social networks.
Integrating and deploying Large Language Models (LLMs) in application design often requires specialized knowledge and a dedicated toolkit. With the trend of integrating AI into systems for natural language processing, LangChain is a tool created to help chain, coordinate, and interact structurally with LLMs. Compared to traditional methods, using LangChain makes integrating LLMs into systems easy, allowing dynamic interaction with various data sources to deliver rich experiences, instead of just making standard API calls. Additionally, LangChain helps developers build agents capable of reasoning and breaking down problems, introducing context and memory into task processing,...
In this article, let's explore LangChain - a powerful framework for creating applications supporting Large Language Models.
Overview
The LangChain framework provides modules that allow you to manage various aspects of interactions with LLM. These modules include:
-
Model I/O: This is the foundational component for communicating with language models. This module offers an interface for seamless interaction with any language model. Specifically:
-
Inputs are defined by Prompts - a collection of instructions or inputs provided by the user to help the language model understand the context and generate coherent responses.
-
The processing part is the language model that LangChain supports, integrating with two main types: LLMs (accepting an input string → returning an output string) and Chat models (accepting a list of Chat Messages → returning a Chat Message).
-
Additionally, this module contains output parsers to convert raw text results into organized and structured information.
With just the Model I/O module, users can easily interact with and integrate LLM into their applications.
-
Model I/O module
- Retrieval: This component facilitates the integration of user-specific data into the generation phase of the language model using Retrieval Augmented Generation (RAG). LangChain supports functionalities like document loaders, document transformers, text embedding models, and various retrieval algorithms, as well as ways to store data in vector form, ensuring efficient and context-appropriate data usage.
Retrieval module
- Chains: This component helps design complex processes by connecting multiple LLMs or other components, including other chains. This approach is both simple and effective, aiding in the development of intricate applications and enhancing maintainability. An example of Chain: a sequence that takes user input, converts it to a PromptTemplate format, processes it with an LLM, and then aggregates the results.
- Agents: This component allows applications to use the language model as an inference tool to decide the order of actions flexibly. LangChain offers various Agent types combined with diverse Tools to help agents interact with general utilities, with Chains, or with other agents, forming a robust framework to handle complex tasks.
- Memory: This component is crucial for applications with a dialogue interface, allowing the inclusion of previous conversation flows into the LLM, maintaining the user interaction context. LangChain provides numerous utilities to integrate memory into the system, supporting basic operations like reading and writing, ensuring the system can access past messages or maintain continuous updates.
- Callbacks: This component allows the registration of events to exploit different phases of the LLM, useful for tasks like logging, monitoring, streaming, etc. These events are triggered by corresponding CallbackHandlers for each registered event. Additionally, LangChain also offers some built-in handlers like the StdOutCallbackHandler that logs all events during execution.
Next, let's explore some features that LangChain provides through an actual demo.
Demo
Demo objective
In this demo, we will build a Q&A application, using LLM to generate responses. You can build a general Q&A application or one on a chosen topic. In this demo, we will choose the topic of Cloud Design Patterns. The demo will use VertexAI.
You can also use alternative LLMs like OpenAI or Azure OpenAI, learn more about LLMs supported by LangChain here.
The input is the user's question about Cloud Design Patterns.
The output is the design patterns relevant to the input question, answered by LLM (in this demo, we are using VertexAI).
Question: Which design pattern should I use when...
Answer:
- <Pattern 1>: <reason to choose this pattern>
- <Pattern 2>: <reason to choose this pattern>
- <Pattern 3>: <reason to choose this pattern>
Applying LangChain's Model I/O module to the demo
Install libraries
We install the necessary libraries as follows:
langchain
google-cloud-aiplatform
Import libraries
from langchain.llms import VertexAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from google.oauth2 import service_account
from langchain.pydantic_v1 import BaseModel, Field
from typing import List
Defining the Structure of the Returned Result
First, we need to define the structure of the returned result, from which we create format_instructions for the language model in the next prompt template creation step. LangChain supports many types of OutputParser suitable for different needs. Here, we want the result returned in JSON format, so we will use PydanticOutputParser.
We define Classes describing the structure of the returned result:
# Define the DesignPattern model
class DesignPattern(BaseModel):
pattern: str = Field(description="name of design pattern")
reason: str = Field(description="reason to choose this design pattern")
# Define the Response model
class Response(BaseModel):
answer: List[DesignPattern]
Then, we create the corresponding OutputParser object:
# Initialize the output parser with the Response model
output_parser = PydanticOutputParser(pydantic_object=Response)
Define prompt template
Next, we create the prompt template as follows:
# Define the prompt template for the retrievalQA
format_instructions = output_parser.get_format_instructions()
prompt_template = """
You are a helpful assistant that can answer questions about cloud design pattern.
Answer the following question: {question}
List out name of top 3 suitable design patterns and brief explain reason
{format_instructions}
"""
# Create a prompt instance with the defined template
PROMPT = PromptTemplate(
template=prompt_template,
input_variables=["question"],
partial_variables={"format_instructions": format_instructions}
)
In the above code, we:
- Use the OutputParser from the previous step to create format_instructions. In this demo, the created format_instructions will look like the following and will be added to the prompt template:
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
Here is the output schema:
```
{"properties": {"answer": {"title": "Answer", "type": "array", "items": {"$ref": "#/definitions/DesignPattern"}}}, "required": ["answer"], "definitions": {"DesignPattern": {"title": "DesignPattern", "type": "object", "properties": {"pattern": {"title": "Pattern", "description": "name of design pattern", "type": "string"}, "reason": {"title": "Reason", "description": "reason to choose this design pattern", "type": "string"}}, "required": ["pattern", "reason"]}}}
```
- The created prompt template will have one variable that needs to be passed in for processing, which is "question" - the user's query.
Creating LLM and Execution
Next, we create LLM objects and execute them to get the following results:
# Initialize service account credentials
credentials = service_account.Credentials.from_service_account_file("<path_to_service_account>")
# Initialize the VertexAI and LLMChain instances
llm = VertexAI(project="<your_gcp_project>", credentials=credentials, max_output_tokens=1000)
chain = LLMChain(llm=llm, prompt=PROMPT)
# Define the question and print it
question = "Which design pattern should I use when design a webpage with long-time-processing backend"
print(f"Question: {question}")
# Run the chain to get the response
response = chain.run(question=question)
response_data = output_parser.parse(response)
# Print the design patterns and their reasons
print("Answer: ")
for design_pattern in response_data.answer:
print(f"- {design_pattern.pattern}: {design_pattern.reason}")
In the code segment above, we can see that the connection and usage of VertexAI, as supported by LangChain, is incredibly straightforward and concise, yet still customizable with necessary parameters.
Results:
Question: Which design pattern should I use when design a webpage with long-time-processing backend
Answer:
- Asynchronous Messaging: Asynchronous messaging allows the backend to process the request without blocking the user interface. This is important for long-time-processing tasks, as it allows the user to continue using the application while the task is being processed.
- Command Query Responsibility Segregation (CQRS): CQRS separates the read and write operations on a database, which can improve performance for long-time-processing tasks. This is because read operations can be performed without blocking write operations, and vice versa.
- Event Sourcing: Event sourcing stores all changes to a database as a sequence of events. This can make it easier to track and replay changes, which can be useful for long-time-processing tasks.
Add reference data to the application
So, we have gone through a simple demo using the Model I/O Module of LangChain to connect and utilize LLM. Next, we will enhance the demo as follows: Add documentation for the LLM to refer to when answering questions about Cloud Design Patterns. In this step, we will add Azure's documentation. To accomplish this, we will use the Document Loader, Embed, and Retriever of the Retrieval Module in LangChain.
Applying LangChain's Retrieval module to the demo
Install additional libraries
We install the necessary libraries as follows:
from langchain.document_loaders import PlaywrightURLLoader
from langchain.embeddings import VertexAIEmbeddings
from langchain.vectorstores import FAISS
Update prompt template
We update the code to create the prompt template as follows:
prompt_template = """
You are a helpful assistant that can answer questions about cloud design pattern.
Answer the following question: {question}
Use this docs for references: {docs}
List out name of top 3 suitable design patterns and brief explain reason
{format_instructions}
"""
# Create a prompt instance with the defined template
PROMPT = PromptTemplate(
template=prompt_template,
input_variables=["question", "docs"],
partial_variables={"format_instructions": format_instructions}
)
In the above code, we have added the instruction “Use this docs for references: {docs}” for the LLM when answering questions and added the corresponding input variable "docs" for the prompt template.
Retrieving Data from Azure Documentation and Creating Reference Data
When the LLM executes, we need to pass reference_data through the input variable “docs”. We add the code segment as follows:
# Create reference data from url
urls = ["https://learn.microsoft.com/en-us/azure/architecture/patterns/"]
loader = PlaywrightURLLoader(urls=urls)
data = loader.load()
embeddings = VertexAIEmbeddings(project="<your_gcp_project>", credentials=credentials)
reference_data = FAISS.from_documents(data, embeddings)
In the above code:
- We use the Playwright library, wrapped by LangChain through PlaywrightURLLoader, to retrieve information from the URL.
- Convert the retrieved information in text form into a vector format using the FAISS library (Facebook AI Similarity Search) for efficient processing and storage.
Executing the LLM
We update the LLM code segment as follows:
# Run the chain to get the response
docs = reference_data.similarity_search(question, k=4)
docs_page_content = " ".join([d.page_content for d in docs])
response = chain.run(question=question, docs=docs_page_content)
response_data = output_parser.parse(response)
We have added additional information about Azure Cloud Design Pattern from Azure's documentation as a reference. We obtain the results as follows:
Question: Which design pattern should I use when design a webpage with long-time-processing backend
Answer:
- Asynchronous Request-Reply: This pattern decouples backend processing from a frontend host, where backend processing needs to be asynchronous, but the frontend still needs a clear response.
- Cache-Aside: This pattern improves performance by loading data on demand into a cache from a data store.
- Circuit Breaker: This pattern handles faults that might take a variable amount of time to fix when connecting to a remote service or resource.
We can see that the answer now includes references from Azure's documentation.
Enhancing with RetrievalQA Chain
Through the enhanced demo, we have used the Retrieval Module in LangChain to integrate additional information into the LLM's processing. However, with the current code, the system has to re-run the step of fetching and processing data from the URL, which is quite time-consuming. To speed things up, we can store the reference_data using Vector Stores supported by LangChain. But in this demo, we will use LangChain's Chain Module to implement a RetrievalQA Chain object with the ability to combine the language model with retrieval data, allowing us to input the reference_data once and use it for multiple subsequent questions.
Applying LangChain's Chain module to the demo
Import additional library
from langchain.chains import RetrievalQA
Update prompt template
We update the code to create the prompt template as follows:
from langchain.chains import RetrievalQA
In the prompt template, we replaced the instruction for using reference data ("Use this doc...") with an instruction for using context, and updated the corresponding input variable accordingly.
Creating RetrievalQA and Execution
We create the RetrievalQA object as follows:
# Create RetrievalQA
chain_type_kwargs = {"prompt": PROMPT}
retrievalqa = RetrievalQA.from_chain_type(llm=VertexAI(credentials=credentials, max_output_tokens=1000), chain_type="stuff", retriever=reference_data.as_retriever(), chain_type_kwargs=chain_type_kwargs)
In which, we set up the LLM object and the Retriever object right within RetrievalQA. Next, we will execute the run from retrievalqa instead of the LLMChain object ("chain") from the previous code segment. As a result, we can omit the current code sections under the comments "Initialize the VertexAI and LLMChain instances" and "Run the chain to get the response".
Subsequently, we update the execution code segment as follows:
# Create a prompt instance with the defined template
...
# Initialize service account credentials
...
# Create reference data from url
...
# Create RetrievalQA
chain_type_kwargs = {"prompt": PROMPT}
retrievalqa = RetrievalQA.from_chain_type(llm=VertexAI(credentials=credentials, max_output_tokens=1000), chain_type="stuff", retriever=reference_data.as_retriever(), chain_type_kwargs=chain_type_kwargs)
# Define the questions and execute
questions = [
"Which design pattern should I use when design a webpage with long-time-processing backend",
"Which design pattern should I check when design a microserver system",
"Which design pattern to prevent single point of failure"
]
for question in questions:
print(f"Question: {question}")
# Run the chain to get the response
response = retrievalqa.run(question)
response_data = output_parser.parse(response)
# Print the design patterns and their reasons
print("Answer: ")
for design_pattern in response_data.answer:
print(f"- {design_pattern.pattern}: {design_pattern.reason}")
print()
Result:
Question: Which design pattern should I use when design a webpage with long-time-processing backend
Answer:
- Asynchronous Request-Reply: This pattern allows the backend processing to be asynchronous, while still providing a clear response to the frontend.
- Circuit Breaker: This pattern can handle faults that might take a variable amount of time to fix when connecting to a remote service or resource.
- Queue-Based Load Leveling: This pattern can help to smooth intermittent heavy loads on the backend.
Question: Which design pattern should I check when design a microserver system
Answer:
- Sidecar: Sidecar pattern is suitable for designing a microservice system because it allows components of an application to be deployed into a separate process or container, providing isolation and encapsulation. This can help to improve the scalability, reliability, and maintainability of a microservice system.
- Ambassador: The Ambassador pattern is a good choice for designing a microservice system because it allows you to create helper services that send network requests on behalf of a consumer service or application. This can help to improve the performance and scalability of your microservice system.
- Circuit Breaker: The Circuit Breaker pattern is a good choice for designing a microservice system because it allows you to handle faults that might take a variable amount of time to fix when connecting to a remote service or resource. This can help to improve the reliability and availability of your microservice system.
Question: Which design pattern to prevent single point of failure
Answer:
- Bulkhead: Bulkhead pattern isolates elements of an application into pools so that if one fails, the others will continue to function.
- Circuit Breaker: Circuit Breaker pattern handles faults that might take a variable amount of time to fix when connecting to a remote service or resource.
- Leader Election: Leader Election pattern coordinates the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances.
From this point, we can further upgrade the application such as customizing the data retrieval from reference documents by defining additional parameters in the as_retriever method. Or we can use RetrievalQAWithSourcesChain to have additional data about the sources that the model used to answer questions from reference data... These are advanced topics of LangChain that we can discuss in another article.
With the above demo, we utilized LangChain's modules such as Model I/O, Retrieval, and Chain to develop a question-answering application about design patterns using VertexAI, referencing Azure's documentation. Through the demo, we can see the functionalities that LangChain offers and how to employ LangChain for seamless interaction with LLM and system integration.
Applications
Given its capabilities, LangChain can support building systems like:
- Question-Answering using reference documentation: LangChain allows users to provide context and ask specific questions. The system then intelligently interacts with the textual data to give precise answers.
- Code Completion Tool: Use LangChain to integrate OpenAI into code completion tools, assisting in proposing code that aligns with the developer's context.
- Summarization: LangChain can summarize various types of texts, such as calls, articles, books, academic papers, legal documents, user histories, or financial documents. This tool can efficiently extract essential information from multiple content types.
- Chatbot: In the realm of customer service, LangChain can enhance chatbots' ability to comprehend and respond to customer inquiries in multiple languages, amplifying their functionality and creating engaging user interactions.
- Interacting with APIs: LangChain can engage, comprehend user context, choose, and interact with suitable APIs, facilitating data search and processing from diverse sources more effortlessly.
Conclusion
LangChain isn't just a potent framework for interacting with and integrating LLMs into applications. It's also a versatile solution for numerous tasks in natural language processing. With the modules and functionalities that LangChain provides, building and deploying NLP applications becomes easier and more efficient than ever. Harness the power of LangChain to unlock new possibilities in today's digital world.