Back in 2022 google claude sucked. The joke was “clod” because it was stupid. Google has been behind the ai 8 ball from the start, whether because of aging out or the fact ai will destroy googles business model is a good question. To prop up its share price google has been more or less forced to spin up some new products. This is a bit like deck chairs on the titanic. Notebooklm was their first fairly nice product: it’s a RAG (retrieval augmented generation) interface that lets you query your files intelligently and can even generate a nice podcast all for free so thank you for that. To export its script to a file just transform (hh) into video with e.g. capcut, openshot, upload to youtube, waite for the autocaptions, download and joy. Then, google touted its quantum computing breakthrough which while progress is not a “game changer”. Now google is adding nice vision tools to gemini and this is relevant for MEDICAL TECHNOLOGY and the fact is llm AI will absolutely revolutionize medicine not merely in diagnostics like illustrated in the video but more importantly in essentially iterating through permutations of CGAT to determine which drugs work and fail and that WILL be a game changer expect cures to various cancerS (cancer is catch all term and its treatment requires seeing them as organ specific) ok i should have done medicine too bad the system fucks over the poor so here i am.
But none of that will rescue googles failed business model which has been nearly so enshittified as fakebook. I don’t Hate google but is precision search so difficult? Nope! But it’s not profitable. Hey, did i mention about fucking over the poor?
This, if true, and it probably is a big deal, since it makes 4b llm ai possible on mobile devices. everyone is currently figuring we are stuck with 1b modles on mobile. 1b models are dumb. 4b are not! 14b would be better but that’s unrealistic on a phone e.g., at least not without serious quantization (in other words stupefaction).
https://venturebeat.com/ai/new-llm-optimization-technique-slashes-memory-costs-up-to-75/
Here’s vids about goog followed by the simplest local rag possible.
import os
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_ollama import OllamaLLM
from langchain.chains import RetrievalQA
from langchain import hub
# Initialize Ollama LLM
llm = OllamaLLM(model="mistral-nemo")
# Set up document processing
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
# Define the path to the Dump directory
dump_dir = os.path.expanduser("~/Dump")
# Function to load documents from the Dump directory
def load_documents_from_dump():
docs = []
for filename in os.listdir(dump_dir):
if filename.endswith(".txt"): # Adjust this if you want to include other file types
file_path = os.path.join(dump_dir, filename)
loader = TextLoader(file_path)
docs.extend(loader.load())
return docs
# Load and process documents
docs = load_documents_from_dump()
texts = text_splitter.split_documents(docs)
# Debug print
print(f"Number of documents loaded: {len(texts)}")
if texts:
print(f"First document content: {texts[0].page_content[:100]}...") # Print first 100 characters
else:
print("No documents found in ~/Dump. Please add some text files.")
exit()
# Create vector store
vectorstore = Chroma.from_documents(texts, embeddings)
# Set up retriever
retriever = vectorstore.as_retriever()
# Create RAG chain
rag_prompt = hub.pull("rlm/rag-prompt")
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=retriever,
chain_type_kwargs={"prompt": rag_prompt}
)
# Main query loop
while True:
query = input("Enter your question (or 'quit' to exit): ")
if query.lower() == 'quit':
break
result = qa_chain({"query": query})
print(result['result'])
print("Thank you for using the RAG system!")