Edition 37 🌸

Decentralisation training looks promising.

A few days ago, I received a Slack notification from the Flower Dev Community—known for their impressive work in federated learning. Curious, I checked out their page and discovered something exciting: FlowerLLM, a small yet powerful language model they've developed.

I explored further and found some promising insights. Stay Tuned :)

🌸 LLM

Flower, in partnership with the CaMLSys lab at the University of Cambridge, has trained for the first time a 1.3 billion parameter LLM using a novel formulation of federated learning methods. The resulting LLM and companion methodology, that we call FlowerLLM, beats the previous record set by Google DeepMind by more than a factor of three and paves the way towards an era of increasingly democratic participation within foundation models. Arguably of even greater importance, this invention of a viable federated approach to LLM pre-training is likely to directly lead to stronger more capable foundation models by increasing the access to both compute and data.

OpenDiLoCo is an open-source implementa- tion and replication of the Distributed Low- Communication (DiLoCo) training method for large language models. We provide a reproducible implementation of the DiLoCo experiments, of- fering it within a scalable, decentralized train- ing framework using the Hivemind library. We demonstrate its effectiveness by training a model across two continents and three countries, while maintaining 90-95% compute utilization. Addi- tionally, we conduct ablations studies focusing on the algorithm’s compute efficiency, scalabil- ity in the number of workers and show that its gradients can be all-reduced using FP16 without any performance degradation. Furthermore, we scale OpenDiLoCo to 3× the size of the original work, demonstrating its effectiveness for billion parameter models.

Deploying Large Language Models (LLMs) locally on mo- bile devices presents a significant challenge due to their extensive memory requirements. In this paper, we introduce LinguaLinked, a system for decentralized, distributed LLM inference on mobile devices. LinguaLinked enables collabo- rative execution of the inference task across multiple trusted devices. LinguaLinked ensures data privacy by processing information locally. LinguaLinked uses three key strategies. First, an optimized model assignment technique segments LLMs and uses linear optimization to align segments with each device’s capabilities. Second, an optimized data trans- mission mechanism ensures efficient and structured data flow between model segments while also maintaining the in- tegrity of the original model structure. Finally, LinguaLinked incorporates a runtime load balancer that actively monitors and redistributes tasks among mobile devices to prevent bottlenecks, enhancing the system’s overall efficiency and responsiveness. We demonstrate that LinguaLinked facili- tates efficient LLM inference while maintaining consistent throughput and minimal latency through extensive testing across various mobile devices, from high-end to low-end An- droid devices. In our evaluations, compared to the baseline, LinguaLinked achieves an inference performance acceler- ation of 1.11× to 1.61× in single-threaded settings, 1.73× to 2.65× with multi-threading. Additionally, runtime load balancing yields an overall inference acceleration of 1.29× to 1.32×.

Researchers with the University of California at Irvine have built LinguaLinked, software that lets a bunch of mobile phones collectively run and serve language models. This is the kind of research that matters a lot for AI policy - most AI policy relies on some notion of cloud infrastructure and big data centers serving as central control points for AI systems. But research like this breaks that assumption - if you can access the weights of a model, then you can serve it guerilla style from a whole bunch of mobile phones which you’ve cleverly chained together. 

    “The core concept behind LinguaLinked is to distribute segments of an LLM across multiple mobile devices, which then work together to serve inference queries,” the researchers write. This is a big deal wrapped in a dull technical paper!

Have a thought “if it happens, the current AI policy paradigm will break” - whats your view? Lets discuss!!

Thanks for reading Musings on AI! This post is public so feel free to share it.

Love MusingsOnAI? Tell your friends and get rewards!

If your company is interested in reaching an audience of AI professionals and decision makers, reach us.

If you have any comments or feedback, just respond to this email!Thanks for reading,Raahul

Reply

or to participate.