The part about GPTQ is pretty bizarre - I would've thought quantization is just doing what you showed at scale. Maybe it works because it does that rounding operation in a vectorized operation? Rather than naive rounding which is slower? That doesn't sound like I'm saying anything intelligence. A tad funny that we don't know exactly why quantization works.
Efficiency in inference for large language models is paramount, and this article provides valuable insights. This article highlights key strategies for optimizing inference in large language models, emphasizing the significance of code profiling and simple optimizations like data structure changes to enhance performance and minimize resource utilization.
I've posted some of the conversations I've been engaged in with ChatGPT (Chatty). Thought you might find them interesting. I'd definitely love to hear your feedback regarding the models Chatty develops as a high-level architecture to a GAI entity's sentience.
I've finished the second of three posts which is the second section of a long interview (in a series I’ve done). You can find the first post in the series @ In It's Own Words- "I Apologize. I Can't Do That" In the post, I ask Chatty (ChatGPT) to write a series of essays touching of various aspects of biological sensation and perception. This article ends just prior to me asking Chatty to “generate an emergent model of qualia, which begins with single cell organisms and ranges through advanced sensory and cognitive organisms.” Chatty’s answer sets up the final part of this three-part interview to have Chatty model the GAI entity analog of biological developmental stages which Chatty speculates (given an architecture could support such high-level models) GAI entities would experience subjective and (perhaps) qualitative states. Chatty’s speculation on instantiating GAI volition are presented at the end of the final section of this three-part interview.
Efficient LLM inference
The part about GPTQ is pretty bizarre - I would've thought quantization is just doing what you showed at scale. Maybe it works because it does that rounding operation in a vectorized operation? Rather than naive rounding which is slower? That doesn't sound like I'm saying anything intelligence. A tad funny that we don't know exactly why quantization works.
Efficiency in inference for large language models is paramount, and this article provides valuable insights. This article highlights key strategies for optimizing inference in large language models, emphasizing the significance of code profiling and simple optimizations like data structure changes to enhance performance and minimize resource utilization.
Seems like the figure comparing model sizes to levels of precision is missing.
I've posted some of the conversations I've been engaged in with ChatGPT (Chatty). Thought you might find them interesting. I'd definitely love to hear your feedback regarding the models Chatty develops as a high-level architecture to a GAI entity's sentience.
I've finished the second of three posts which is the second section of a long interview (in a series I’ve done). You can find the first post in the series @ In It's Own Words- "I Apologize. I Can't Do That" In the post, I ask Chatty (ChatGPT) to write a series of essays touching of various aspects of biological sensation and perception. This article ends just prior to me asking Chatty to “generate an emergent model of qualia, which begins with single cell organisms and ranges through advanced sensory and cognitive organisms.” Chatty’s answer sets up the final part of this three-part interview to have Chatty model the GAI entity analog of biological developmental stages which Chatty speculates (given an architecture could support such high-level models) GAI entities would experience subjective and (perhaps) qualitative states. Chatty’s speculation on instantiating GAI volition are presented at the end of the final section of this three-part interview.