Friday, 22 December 2023

New top story on Hacker News: Show HN: Local fine tuning for Mistral and SDXL, GPU mem/latency optimization

Show HN: Local fine tuning for Mistral and SDXL, GPU mem/latency optimization
21 by lewq | 3 comments on Hacker News.
100% bootstrapped new startup. It lets you fine tune Mistral-7B and SDXL. In particular, for the LLM fine tuning we implemented a dataprep pipeline that turns websites/pdfs/doc files into question-answer pairs for training the small LLM using an big LLM. It includes a GPU scheduler that can do finegrained GPU memory scheduling (Kubernetes can only do whole-GPU, we do it per-GB of GPU memory to pack both inference and fine tuning jobs into the same fleet) to fit model instances into GPU memory to optimally trade off user facing latency with GPU memory utilization It's a pretty simple stack of control plane and a fat container that runs anywhere you can get hold of a GPU (e.g. runpod). Architecture: https://ift.tt/6GwSjgL Demo walkthrough showing runner dashboard: https://ift.tt/7PZ5lVh Run it yourself: https://ift.tt/yV0lKTC Discord: https://ift.tt/RikFTso Please roast me!

No comments:

Post a Comment