Abstract
(Decentralized Weight-Decomposed Low-Rank Adaptation) extends the DoRA algorithm to a decentralized environment, enabling distributed training across multiple servers.Algorithm
Step 1: Initialization
We first initialize the parameter matrices of Theia and the low-rank matrices.- Magnitude Vector Initialization: Initialize the magnitude vector using the column-wise norm of the pre-trained weight matrix :
- Directional Matrix Initialization: Set the directional matrix to the pre-trained weight matrix :
- Low-Rank Matrices Initialization: Initialize the low-rank matrices and for the LoRA method.
Step 2: Decomposition
Decompose the pre-trained weight matrix into its magnitude and direction components. The pre-trained weight is directly derived from the NLP model.- Magnitude Decomposition:
- Directional Decomposition:
Step 3: Distributed Training
For each server in the decentralized network, perform the following:- Distribution: Distribute the initialized , , , and to all servers.
-
Iterations:
For each iteration from 1 to :
- Weight Update Calculation: Compute the weight update using the low-rank adaptation method:
- Magnitude Vector Update: Update the magnitude vector :
- Directional Matrix Update: Update the low-rank matrices and with gradient descent:
Step 4: Aggregation
After all iterations are completed, collect updates from all servers and aggregate them:- Magnitude Vector Aggregation:
- Directional Matrix Aggregation:
- Low-Rank Matrices Aggregation:
Step 5: Merging
Merge the updated components to form the final weight matrix:- Final Weights Calculation: