Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

		Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs (fireworks.ai)
		1 point by kmdupree on March 21, 2024 \| hide \| past \| favorite