cover of episode Run Llama Without a GPU! Quantized LLM with LLMWare and Quantized Dragon

Run Llama Without a GPU! Quantized LLM with LLMWare and Quantized Dragon

2024/1/8
logo of podcast Machine Learning Tech Brief By HackerNoon

Machine Learning Tech Brief By HackerNoon

Shownotes Transcript

This story was originally published on HackerNoon at: https://hackernoon.com/run-llama-without-a-gpu-quantized-llm-with-llmware-and-quantized-dragon). Use AI miniaturization to get high-level performance out of LLMs running on your laptop! Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning). You can also check exclusive content about #llm), #chatgpt), #quantization), #rag), #python), #mlops), #gpu-infrastructure), #hackernoon-top-story), #hackernoon-es), #hackernoon-hi), #hackernoon-zh), #hackernoon-fr), #hackernoon-bn), #hackernoon-ru), #hackernoon-vi), #hackernoon-pt), #hackernoon-ja), #hackernoon-de), #hackernoon-ko), #hackernoon-tr), and more.

        This story was written by: [@shanglun](https://hackernoon.com/u/shanglun)). Learn more about this writer by checking [@shanglun's](https://hackernoon.com/about/shanglun)) about page,
        and for more stories, please visit [hackernoon.com](https://hackernoon.com)).
        
            
            
            As GPU resources become more constrained, miniaturization and specialist LLMs are slowly gaining prominence. Today we explore quantization, a cutting-edge miniaturization technique that allows us to run high-parameter models without specialized hardware.