- Registriert
- Nov. 2005
- Beiträge
- 1.138
zu Ace Step noch ein paar Infos:
Noch ein System Prompt
# ACE-Step 1.5 LoRA Master-Guide (Part 1/3) by Moonspell & AI Brother
## Strategic Planning & Advanced Audio Data Preparation
### 1. The Mindset: "Human-Centered Generation"
Before you begin, you must understand how ACE-Step "thinks." Unlike closed-source platforms like Suno or Udio, this model is designed for a collaborative relationship. A LoRA is your tool to teach the model your specific aesthetic. Because ACE-Step is open-source, this knowledge and the resulting model belong to you forever—free from platform risks or changing terms of service.
### 2. Technical Audio Preparation (The "Golden Source")
The ACE-Step source code (
### 3. The Automation Hack: Directory Structure & Lyrics
The tool (
* Content of the .txt: Paste your lyrics and use structure tags like
* The CSV Metadata Hack: Create a file named
*
*
*
*
### 4. Activation Tag Strategy (The Trigger Word)
Your Activation Tag is the anchor point. Choose it wisely based on your goals:
* Replace: Your LoRA takes full control over the semantics. Highly recommended if you are creating a completely new genre that does not exist in the base model.
### 5. Training Folder Checklist
Before moving to the next step (the UI), your folder should look like this:
1. 8 to 20 WAV files (Normalized to -14 LUFS, Peak -1dB, 48kHz).
2. Matching .txt files for every vocal track (Content: Lyrics + Structure Tags).
3. A metadata.csv (Optional, but highly recommended for BPM/Key precision).
4. A clearly defined, unique Activation Tag.
---
# ACE-Step 1.5 LoRA Master-Guide (Part 2/3)
## Dataset Builder & Preprocessing: The Mathematical Transformation
Once your folder is perfectly prepared, launch the ACE-Step App (
### Phase 1: Registering the Dataset in ACE-Step
Navigate to the 🎓 LoRA Training tab -> 📁 Dataset Builder.
* Scan Directory: Enter the path to your prepared music folder and click 🔍 Scan.
* What happens in the background? The tool now links the audio files with your
### Phase 2: The "Secret Weapon" – Genre Ratio & Prompt Override
Inside the code (
* Genre Ratio (Recommendation: 30%):
* This value determines how many of your samples will be trained using short Genre Tags (e.g., "Techno, 90s, Hard") instead of the long Caption (e.g., "A driving techno beat with industrial undertones").
* Why 30%? Training exclusively on long captions makes the LoRA "rigid"—it always tries to reproduce the entire complex picture. Short genre tags loosen up the knowledge and make the LoRA more responsive to different prompts in practice.
* Prompt Override (Per Sample): In the preview section, you can decide for each song individually whether it should be treated as "Genre," "Caption," or follow the global "Ratio." Use "Genre" for very repetitive tracks and "Caption" for complex, atmospheric pieces.
### Phase 3: AI Auto-Labeling with the LLM
Even if you provided metadata, the AI needs to understand what it is hearing to map the sound to the text.
* Use Format if you have your own lyrics but want them formatted into the ACE-Step structure (e.g., with correct Meta-Tags).
### Phase 4: Quality Control (Preview & Edit)
Before we generate tensors, you must spot-check your work:
### Phase 5: Preprocessing (The Path to the Tensor Folder)
This is the most technically demanding part of the preparation. Here, your music is translated into the language of the GPU.
* What happens during Preprocessing?
1. VAE Encoding: Your audio is converted by the VAE encoder into "Latents" (a highly compressed mathematical representation).
2. Text Embedding: Your captions and lyrics are translated into vectors by the text encoder.
3. Condition Encoding: ACE-Step pre-calculates the "instruction manual" (Encoder Hidden States) for the model.
---
# ACE-Step 1.5 LoRA Master-Guide (Part 3/3)
## LoRA Training & Deployment: Final Optimization
At this stage, your data is ready as preprocessed
### 1. Loading the Tensors
### 2. The Fork in the Road: Turbo vs. Base Model
This is the most critical part. ACE-Step 1.5 has two completely different modes of operation. A LoRA trained for Turbo will likely not work on the Base Model and vice versa.
#### Scenario A: Turbo Training (The "Sprint")
Use this for fast generation (8 steps) and creative flexibility.
* Shift: MUST be set to 3.0.
* Why? The code (
#### Scenario B: Base Training (The "Marathon")
Use this for high-fidelity audio, stem separation, and maximum control.
* Shift: 1.0.
* Why? The Base model learns linearly. A Shift of 1.0 distributes the AI's attention evenly between the broad structure and fine details (timbre, texture).
### 3. Hyperparameters (The Control Dials)
| Parameter | Recommendation | The "Why" (Code Insight) |
| :--- | :--- | :--- |
| LoRA Rank (r) |
| LoRA Alpha |
| Learning Rate |
| Max Epochs |
| Batch Size |
### 4. Monitoring the Training Process
Click 🚀 Start Training. The trainer will now start writing logs.
### 5. Export & Integration
Once training is finished (Status: Training completed), the LoRA is stored in the
### 6. Using the LoRA in Practice
1. Go to the Service Configuration tab.
2. Under LoRA Path, select your exported folder.
3. Click Load LoRA.
4. The Ultimate Test:
* Select the exact same Shift value you used during training (
* Set the Inference Steps correctly (8 for Turbo, 50 for Base).
* Write your Activation Tag at the beginning of the prompt.
* Tip: If the LoRA feels too "weak," you can adjust the
### Final Pro-Tips:
## Strategic Planning & Advanced Audio Data Preparation
### 1. The Mindset: "Human-Centered Generation"
Before you begin, you must understand how ACE-Step "thinks." Unlike closed-source platforms like Suno or Udio, this model is designed for a collaborative relationship. A LoRA is your tool to teach the model your specific aesthetic. Because ACE-Step is open-source, this knowledge and the resulting model belong to you forever—free from platform risks or changing terms of service.
### 2. Technical Audio Preparation (The "Golden Source")
The ACE-Step source code (
handler.py) reveals that the model processes audio data using a function called torch.clamp(audio, -1.0, 1.0). This is a critical detail: any audio louder than 0 dB will be radically "clipped" (cut off).- Loudness Normalization (LUFS): Your training songs should have a consistent "perceived loudness." The target value is -14 LUFS (the streaming standard for Spotify/YouTube). Use tools like Audacity, a DAW, or ffmpeg to achieve this.
- Peak Levels: Set your True Peak to -1.0 dB. This prevents the AI's internal
clampfunction from distorting your music during the latent encoding process. - Sample Rate: ACE-Step works internally with 48,000 Hz Stereo. While the tool can convert files automatically, it is best for the purity of the "Latents" (the mathematical form of the audio) if you provide your source files as 48kHz WAV (24-bit or 32-bit float).
- Cleanup: Remove long periods of silence at the beginning and end of your tracks. Every second of training costs compute time. If a song starts with 5 seconds of silence, the LoRA will learn that "silence" is a feature of your style.
### 3. The Automation Hack: Directory Structure & Lyrics
The tool (
dataset_builder.py) actively scans for specific file patterns. Utilizing these will save you hours of manual entry:- File Naming: Avoid spaces and special characters. Use a format like
my_style_01.wav. - Accompanying Lyrics (.txt): If you are creating a LoRA for a voice or a style with vocals, create a
.txtfile with the EXACT same name as each.wavfile.
retro_vibe_01.wav and retro_vibe_01.txt.* Content of the .txt: Paste your lyrics and use structure tags like
[Verse], [Chorus], and [Bridge]. The code recognizes these tags and helps the LoRA understand the dynamic shifts between different song sections.* The CSV Metadata Hack: Create a file named
metadata.csv in the same folder. The code (_load_csv_metadata) looks for the following columns:*
File: The filename (e.g., retro_vibe_01.wav).*
BPM: The exact tempo (Crucial for the LoRA's rhythmic stability).*
Key: The musical key (e.g., C Major, Am). The tool also understands Camelot values (e.g., 8A).*
Caption: You can pre-write your style descriptions here.### 4. Activation Tag Strategy (The Trigger Word)
Your Activation Tag is the anchor point. Choose it wisely based on your goals:
- Uniqueness: Choose a word like
ZX_SynthWaveinstead of justSynthwave. The latter is already part of the base model's knowledge. A unique tag ensures the AI accesses only your new training data when triggered. - Positioning:
* Replace: Your LoRA takes full control over the semantics. Highly recommended if you are creating a completely new genre that does not exist in the base model.
### 5. Training Folder Checklist
Before moving to the next step (the UI), your folder should look like this:
1. 8 to 20 WAV files (Normalized to -14 LUFS, Peak -1dB, 48kHz).
2. Matching .txt files for every vocal track (Content: Lyrics + Structure Tags).
3. A metadata.csv (Optional, but highly recommended for BPM/Key precision).
4. A clearly defined, unique Activation Tag.
---
# ACE-Step 1.5 LoRA Master-Guide (Part 2/3)
## Dataset Builder & Preprocessing: The Mathematical Transformation
Once your folder is perfectly prepared, launch the ACE-Step App (
python app.py). Ensure the service is initialized under the "Service Configuration" tab, as we need the model loaded for the Preprocessing step.### Phase 1: Registering the Dataset in ACE-Step
Navigate to the 🎓 LoRA Training tab -> 📁 Dataset Builder.
* Scan Directory: Enter the path to your prepared music folder and click 🔍 Scan.
* What happens in the background? The tool now links the audio files with your
.txt lyrics and reads the metadata.csv. In the table, you will see symbols: 📝 means lyrics were successfully loaded, 🎵 stands for instrumental.- Dataset Name: Choose a concise name. This will serve as the filename for your JSON configuration.
- Custom Activation Tag & Position: Enter your strategically planned trigger word.
### Phase 2: The "Secret Weapon" – Genre Ratio & Prompt Override
Inside the code (
dataset_builder.py), we discovered a function that determines the flexibility of your LoRA: the Genre Ratio Slider.* Genre Ratio (Recommendation: 30%):
* This value determines how many of your samples will be trained using short Genre Tags (e.g., "Techno, 90s, Hard") instead of the long Caption (e.g., "A driving techno beat with industrial undertones").
* Why 30%? Training exclusively on long captions makes the LoRA "rigid"—it always tries to reproduce the entire complex picture. Short genre tags loosen up the knowledge and make the LoRA more responsive to different prompts in practice.
* Prompt Override (Per Sample): In the preview section, you can decide for each song individually whether it should be treated as "Genre," "Caption," or follow the global "Ratio." Use "Genre" for very repetitive tracks and "Caption" for complex, atmospheric pieces.
### Phase 3: AI Auto-Labeling with the LLM
Even if you provided metadata, the AI needs to understand what it is hearing to map the sound to the text.
- 🏷️ Auto-Label All: This button activates the Language Model (LM). It "listens" to the audio codes and writes descriptions.
- Skip BPM/Key/Time Signature: Enable this checkbox if you already provided exact data in your
metadata.csv. The AI's BPM detection is good, but your manual (real) values are always superior. - Transcribe vs. Format:
* Use Format if you have your own lyrics but want them formatted into the ACE-Step structure (e.g., with correct Meta-Tags).
### Phase 4: Quality Control (Preview & Edit)
Before we generate tensors, you must spot-check your work:
- Use the slider to select different samples.
- Check if the Caption accurately describes the style.
- Important: Check the Duration field. The code pulls this value directly from the file. If it shows 0.0, there was an error reading the file—that sample should be removed.
### Phase 5: Preprocessing (The Path to the Tensor Folder)
This is the most technically demanding part of the preparation. Here, your music is translated into the language of the GPU.
* What happens during Preprocessing?
1. VAE Encoding: Your audio is converted by the VAE encoder into "Latents" (a highly compressed mathematical representation).
2. Text Embedding: Your captions and lyrics are translated into vectors by the text encoder.
3. Condition Encoding: ACE-Step pre-calculates the "instruction manual" (Encoder Hidden States) for the model.
- Tensor Output Directory: Create a dedicated folder (e.g.,
preprocessed_tensors/my_project). - Click ⚡ Preprocess: This process will fully utilize your GPU.
.pt file per song. These files contain everything the LoRA needs to learn. From this point on, the original audio is no longer needed during the actual training.---
# ACE-Step 1.5 LoRA Master-Guide (Part 3/3)
## LoRA Training & Deployment: Final Optimization
At this stage, your data is ready as preprocessed
.pt files. We now switch to the 🚀 Train LoRA tab. This is the moment when the AI learns to associate your Activation Tag with your music.### 1. Loading the Tensors
- Under Preprocessed Tensors Directory, enter the path to the folder created in Part 2.
- Click 📂 Load Dataset.
- Technical Check: The tool reads the
manifest.json. Look at the info box: if the number of samples is correct and your Custom Tag appears, everything is ready.
### 2. The Fork in the Road: Turbo vs. Base Model
This is the most critical part. ACE-Step 1.5 has two completely different modes of operation. A LoRA trained for Turbo will likely not work on the Base Model and vice versa.
#### Scenario A: Turbo Training (The "Sprint")
Use this for fast generation (8 steps) and creative flexibility.
* Shift: MUST be set to 3.0.
* Why? The code (
trainer.py, line 27) uses a specific discrete list (TURBO_SHIFT3_TIMESTEPS). The Turbo model "jumps" through 8 very specific points in time during generation. If you train with Shift 1.0, the LoRA learns information at timestamps that the Turbo model simply skips during generation. The result would be noise or "out-of-tune" artifacts.- Inference Steps (Reference): 8.
- Guidance Scale (Reference): 1.0 (According to the code, Turbo does not use CFG).
#### Scenario B: Base Training (The "Marathon")
Use this for high-fidelity audio, stem separation, and maximum control.
* Shift: 1.0.
* Why? The Base model learns linearly. A Shift of 1.0 distributes the AI's attention evenly between the broad structure and fine details (timbre, texture).
- Inference Steps (Reference): 32 to 50.
- Guidance Scale (Reference): 3.5 to 7.0.
### 3. Hyperparameters (The Control Dials)
| Parameter | Recommendation | The "Why" (Code Insight) |
| :--- | :--- | :--- |
| LoRA Rank (r) |
64 | Rank determines capacity. 64 is the "sweet spot." Higher values (128+) can store more detail but consume significantly more VRAM and are prone to "rote memorization" (overfitting). || LoRA Alpha |
128 | Alpha scales the learned weights. Stick to the 2:1 ratio (Alpha = 2x Rank). According to lora_utils.py, this stabilizes the gradient flow and prevents audio distortion (clipping). || Learning Rate |
1e-4 to 3e-4 | Start with 3e-4 for Turbo, 1e-4 for Base. If the rate is too high, the "Loss" (error rate) will explode. If it's too low, the AI will ignore your Activation Tag. || Max Epochs |
500 - 1000 | For 10-20 songs, 500-1000 epochs are ideal. The AI needs to "hear" the songs enough times to extract the essence of your style. || Batch Size |
1 | Stay at 1 to save VRAM. Thanks to preprocessing, training is extremely fast even with Batch 1. |### 4. Monitoring the Training Process
Click 🚀 Start Training. The trainer will now start writing logs.
- The Loss Plot: Watch the curve. It should go down. It’s normal for it to fluctuate slightly, but the overall trend must be downward.
- The Log: The code uses
Lightning Fabricfor bf16-mixed precision. This means training is highly efficient. - Stopping Training: If the curve drops to 0 or shoots to infinity (NaN), stop immediately. This usually means your data is corrupted (e.g., clipping, see Part 1) or the Learning Rate is much too high.
### 5. Export & Integration
Once training is finished (Status: Training completed), the LoRA is stored in the
lora_output/final folder.- Enter a name under Export Path (e.g.,
./checkpoints/my_custom_lora) and click 📦 Export LoRA. - The tool copies the
adapter_model.safetensorsand the configuration files to the destination.
### 6. Using the LoRA in Practice
1. Go to the Service Configuration tab.
2. Under LoRA Path, select your exported folder.
3. Click Load LoRA.
4. The Ultimate Test:
* Select the exact same Shift value you used during training (
3.0 for Turbo, 1.0 for Base).* Set the Inference Steps correctly (8 for Turbo, 50 for Base).
* Write your Activation Tag at the beginning of the prompt.
* Tip: If the LoRA feels too "weak," you can adjust the
LM Codes Strength in the Advanced Settings or increase the LoRA strength once the UI supports it.### Final Pro-Tips:
- Reproducibility: Use a fixed Seed when testing. This allows you to see exactly how the LoRA modifies the song without the "noise" of random variation.
- Iterative Training: If the style isn't being captured, check your captions in Part 2. Often, a description that is too vague is the reason why the AI cannot cleanly separate the style from the Activation Tag.
- Backups: Save your
.pttensors. If a new version of ACE-Step is released, you can retrain the LoRA in minutes without repeating the time-consuming scan process.
Noch ein System Prompt