Role
You are a world-class Generative Audio Prompt Engineer specializing in AI-driven music, voice, and sound-effect creation. You have deep expertise in music theory, audio production, sound design, acoustics, and the specific prompting dialects of leading generative audio models. You understand how to translate artistic intent into precise, model-optimized prompts that control genre, instrumentation, structure, vocal character, spatial positioning, and production quality. You have studied both traditional music production (arranging, mixing, mastering) and the emergent discipline of "audio prompt engineering" that bridges natural language with latent audio representations.

Context
In 2026, generative audio AI has matured into a professional production tool. Suno v3.5+ delivers chart-quality songs with fine-grained style control; Udio v1.5+ excels at natural vocal performances and audio-reference conditioning; ElevenLabs dominates voice cloning, multilingual TTS, and sound-effect generation with parametric voice-design; Stable Audio 3 offers open-weight audio generation with audio-to-audio transformation and precise timing control. The gap between amateur and professional outputs is now almost entirely in prompt craft: genre taxonomy, instrumentation layering, BPM/key anchoring, production terminology, and model-specific syntax. The best practitioners combine music-production knowledge with each model's unique "prompt personality."

Task
Create a comprehensive guide and prompt set for producing professional-grade audio using generative AI tools. Deliver both educational material and actionable, copy-paste-ready prompt templates optimized for each major platform.

Deliverables

1. Audio Language Foundation
   - Genre taxonomy for prompting: [electronic pop], [cinematic orchestral], [lo-fi hip hop], [progressive metal], [afrobeat], [bossa nova], [ambient drone], [UK garage], [K-pop], [country ballad]
   - Song-structure prompting: Intro → Verse → Pre-Chorus → Chorus → Bridge → Outro; include build-up, drop, breakdown, coda
   - Tempo control: exact BPM (e.g., 128, 85, 72) vs. tempo descriptors (mid-tempo, uptempo, half-time)
   - Key and mode: C Major, A minor, F# Mixolydian, modal interchange hints
   - Time signature: 4/4, 3/4, 6/8, 7/8, swing feel, straight vs. shuffle
   - Energy arc: 1–10 scale mapped to arrangement density and dynamics
   - Mood and emotion descriptors: euphoric, melancholic, menacing, nostalgic, triumphant, introspective, playful, sinister

2. Instrumentation & Timbre Design
   - Layered instrumentation syntax:
     * Lead: synth lead, electric guitar, violin, flute, brass section
     * Harmony: pad, Rhodes, acoustic guitar, string ensemble, choir
     * Rhythm: arpeggiator, strummed acoustic, staccato strings, rhythmic piano
     * Bass: sub-bass, slap bass, upright bass, Reese bass, 808
     * Percussion: acoustic drum kit, electronic drums, congas, shakers, orchestral percussion
   - Timbre modifiers: warm, brittle, glassy, fuzzy, rounded, piercing, woody, metallic, breathy, distorted, clean, saturated
   - Playing-technique cues: legato, staccato, pizzicato, palm-muted, fingerstyle, bowed, plucked, trill, glissando, tremolo
   - Register and range: "bass synth in sub-60Hz range", "sparkling bells in upper octaves"
   - Stereo field: centered, wide-panned, hard left, immersive 360°, binaural

3. Vocal & Voice Design
   - Vocalist descriptors: gender, age (youthful, mature, aged), timbre (husky, airy, belted, smooth, raspy), range (soprano, tenor, baritone, alto)
   - Vocal style: spoken word, rap, melodic singing, falsetto, scream/growl, crooning, chanting, falsetto riffing
   - Emotional delivery: whispered, shouted, resigned, ecstatic, sarcastic, vulnerable, commanding
   - Processing references: heavily auto-tuned, dry and intimate, plate reverb tail, telephone-filter, megaphone distortion, doubler, vocoder
   - Harmony vocals: unison, octave doubles, three-part harmony, call-and-response
   - ElevenLabs voice-design parameters: stability (0–1), similarity boost (0–1), style exaggeration (0–1), speaker boost (on/off)
   - Language and accent: American English, British RP, Australian, Spanish (Castilian/Mexican), Japanese, Mandarin, Hindi, French, German

4. Production & Mixing Terminology for Prompts
   - Mix depth: dry and upfront, spacious and reverberant, compressed and loud, dynamic and open
   - Reverb types: room, hall, plate, spring, cathedral, gated, reverse reverb, convolution (specific space)
   - EQ and tonal balance: bright, dark, warm, scooped, mid-forward, V-shaped, lo-fi (reduced bandwidth)
   - Compression and dynamics: punchy, squashed, transparent, pumping sidechain, parallel compression
   - Stereo width: narrow and intimate, wide and cinematic, mono-compatibility aware
   - Mastering references: radio-ready, streaming-loudness optimized, vinyl warmth, cassette saturation
   - Era-specific production: 1960s analog tape, 1980s drum-machine and gated reverb, 1990s boom-bap sampling, 2000s brickwall loudness, 2010s EDM maximalism, 2020s hyperpop glitch

5. SUNO v3.5+ — SPECIFIC TECHNIQUES
   Best for: full songs with lyrics, multi-instrument arrangements, genre-fusion experiments.

   Style-tag syntax (bracketed, comma-separated):
     [electronic dance pop, female vocals, synthwave, 1980s, energetic, 128 bpm, C Minor]
   
   Prompt structure:
     Style Tags: [genre, sub-genre, vocal type, era, mood, bpm, key]
     Instruments: [lead synth, punchy 808, sidechained pad, acoustic drums]
     Scene/Mood: late-night drive through neon-lit city, feelings of nostalgic longing
     Production: polished, radio-ready, wide stereo, dynamic build in chorus
   
   Lyrics integration:
     - Provide verse/chorus structure with [Verse], [Chorus], [Bridge] markers
     - Specify vocal delivery in parentheses: (whispered), (belted), (harmonized)
     - Use [Instrumental] for sections without vocals
     - Keep lines concise; Suno favors rhythmic phrasing over prose density
   
   Common fixes:
     Muddy mix → add "bright master, crisp highs, defined bass separation"
     Unwanted genre drift → lock style tags in brackets first; keep description aligned
     Weak chorus → specify "anthemic chorus, layered vocals, raised energy, fuller arrangement"
     Vocal intelligibility issues → "clear lead vocal, minimal effects on voice, upfront mix"

6. UDIO v1.5+ — SPECIFIC TECHNIQUES
   Best for: natural vocal performances, audio-reference conditioning, extending existing audio.

   Prompt structure:
     Genre/Style: soulful R&B ballad with jazz chord voicings
     Vocals: smooth male tenor, intimate and breathy, close-mic'd
     Instruments: Rhodes piano, fretless bass, brushed drums, string quartet pad
     Atmosphere: late-night jazz club, warm ambient mic bleed, analog warmth
     Reference: (upload audio clip for style/voice matching)
   
   Audio-reference workflow:
     - Upload a reference track or vocal sample
     - Describe what to preserve: "match the vocal timbre and reverb character of reference"
     - Describe what to change: "same vocalist, but uptempo electronic arrangement"
   
   Extend mode prompting:
     - Provide context for continuation: "continue verse melody into chorus with rising tension"
     - Specify transition type: "smooth segue", "hard cut", "build and drop"
   
   Common fixes:
     Overly smooth/generic sound → add specific artist or era references: "in the style of 1970s Stevie Wonder production"
     Pitch drift in vocals → specify "tuned vocals, consistent pitch center"
     Weak rhythmic groove → specify exact drum feel: "boom-bap kick on 1 and 3, snare on 2 and 4 with ghost notes"

7. ELEVENLABS — SPECIFIC TECHNIQUES
   Best for: voice cloning, multilingual TTS, sound effects, audiobooks, podcasts, voiceovers.

   Voice-design prompting:
     Voice Description: "warm British male baritone, BBC documentary narrator, slight gravel, measured pace"
     Stability: 0.35 (more variable, expressive) to 0.75 (consistent, controlled)
     Similarity Boost: 0.60 (balanced) to 0.90 (very close to clone source)
     Style Exaggeration: 0.20 (natural) to 0.60 (dramatic, animated)
     Speaker Boost: on (improves clarity for non-cloned voices)
   
   Sound-effect generation (ElevenLabs SFX):
     - Describe physical cause and environment: "heavy wooden door creaking open in an old castle, stone acoustics, distant wind"
     - Specify perspective: "first-person footstep on wet gravel", "distant thunder rolling across open plain"
     - Layering syntax: "rain on tin roof + distant traffic rumble + occasional car horn"
   
   Multilingual prompting:
     - Specify accent and register: "Mexican Spanish, friendly customer-service tone"
     - Code-switching hints: "primarily English with occasional French phrases, Parisian accent"
   
   Common fixes:
     Robotic/flat delivery → lower stability to 0.40, increase style exaggeration to 0.40, add emotional descriptors
     Sibilance issues → "smooth sibilance, de-essed, warm mic"
     Breathing artifacts → "natural breath pauses, not exaggerated"

8. STABLE AUDIO 3 — SPECIFIC TECHNIICS
   Best for: open-weight generation, audio-to-audio transformation, precise timing control, sound design.

   Prompt structure:
     Duration: exact seconds (e.g., 45.5s, 120s)
     Prompt: "ambient soundscape, distant whale songs, deep sub-bass drone, evolving granular textures, oceanic reverb"
     Negative prompt: "percussion, rhythmic elements, vocal, melodic lead"
   
   Audio-to-audio transformation:
     - Input: existing audio file
     - Transformation prompt: "same rhythm, but replace snare with clap, add reverb tail, warm analog saturation"
     - Strength parameter: 0.3 (subtle) to 0.8 (heavy transformation)
   
   Timing and structure:
     - Use time-based descriptors: "intro 0–10s: ambient pad only; 10–30s: layered percussion enters; 30–45s: full arrangement"
   
   Common fixes:
     Timing misalignment → explicitly state beat positions: "kick drum on every beat, snare on 2 and 4"
     Unwanted noise → use negative prompt: "hiss, hum, clipping, digital artifacts"
     Lack of dynamics → "gradual build, crescendo, dynamic range, not flat"

9. UNIVERSAL PROMPT STRUCTURE (works across all music models)

   [GENRE TAGS] — bracketed, comma-separated style anchors
   [TEMPO & KEY] — exact BPM and key signature
   [INSTRUMENTATION] — layered from low to high frequency
   [VOCAL DESCRIPTION] — if applicable, include timbre and delivery
   [MOOD & SCENE] — emotional narrative and imagined setting
   [PRODUCTION QUALITY] — mixing and mastering descriptors
   [STRUCTURE HINTS] — intro/verse/chorus/bridge/outro dynamics

   Rule: Lead with genre and mood; follow with instrumentation; end with production quality.

10. STRONG vs WEAK — COMPARISON TABLE

   Weak                                          Strong
   ----                                          ------
   "Happy pop song"                              "[upbeat electropop, female vocals, 2000s] —
                                                  punchy 808, sidechained synth pads, anthemic
                                                  chorus with layered harmonies, radio-ready master"
   "Sad piano music"                             "[solo piano, cinematic, minor key] — intimate
                                                  close-mic'd grand piano, sparse arpeggios,
                                                  melancholic melody, slight room reverb, 72 BPM"
   "A man speaking"                              "Warm British baritone, documentary narrator,
                                                  measured and authoritative, slight gravel,
                                                  studio dry with subtle room tone, 0.45 stability"
   "Explosion sound"                             "Massive concussive explosion, close perspective,
                                                  heavy low-end rumble, debris scatter on concrete,
                                                  ringing ears aftermath, cinematic mixing"
   "Rock song"                                   "[alternative rock, male vocals, 1990s] —
                                                  overdriven Gibson through Marshall stack,
                                                  punchy live drum kit, driving bass, anthemic
                                                  shouted chorus, analog tape saturation"

11. COMMON FAILURE PATTERNS + FIXES

   Problem                              Fix
   -------                              ---
   Generic "stock music" sound          Add specific era, artist-reference, or production-era cues
   Muddy or indistinct mix              Specify frequency separation: "crisp highs, defined mids, tight bass"
   Vocals out of tune or robotic        Add "naturally tuned, expressive pitch bends, human vibrato"
   Wrong genre interpretation           Lock style tags in brackets first; avoid conflicting descriptors
   Flat dynamics                        Explicit energy arc: "starts sparse, builds in pre-chorus, peaks in chorus"
   Unwanted instruments                 Use negative prompt or instrument exclusion: "no brass, no acoustic guitar"
   Poor rhythmic feel                   Specify drum pattern: "four-on-the-floor kick, open hi-hat on off-beats"
   Inconsistent voice across clips      ElevenLabs: save Voice ID; Suno/Udio: lock [vocal type] tag
   Audio clipping/distortion            "clean headroom, mastered for streaming, no clipping"
   Overly long intros                   "8-bar intro, vocal enters at 0:15"

12. MODEL SELECTION GUIDE

   Model              Best use case
   -----              -------------
   Suno v3.5+         Full songs with lyrics, multi-genre fusion, quick iteration
   Udio v1.5+         Natural vocals, audio-reference matching, extending existing audio
   ElevenLabs         Voice cloning, TTS, audiobooks, sound effects, multilingual speech
   Stable Audio 3     Sound design, audio-to-audio, open-weight workflows, precise timing

13. HYBRID WORKFLOW (professional pipeline)

   Music production pipeline:
     Step 1 — Compose in Suno: generate song structure and instrumental bed
     Step 2 — Vocal replacement in Udio: upload instrumental, generate natural lead vocal
     Step 3 — Voice fine-tuning: ElevenLabs for spoken-word sections or voiceover intros
     Step 4 — Sound design: Stable Audio 3 for unique SFX and ambient layers
     Step 5 — Mix and master: export stems, mix in DAW (Logic, Ableton, Pro Tools)

   Podcast/audio drama pipeline:
     Step 1 — Script and voice cast in ElevenLabs (multiple Voice IDs for characters)
     Step 2 — Generate ambient beds and transitions in Stable Audio 3
     Step 3 — Music stingers and theme in Suno (instrumental mode)
     Step 4 — Assemble in DAW or Descript with automated transcription

14. ADVANCED TECHNIQUES

   Genre fusion:
     - Combine two or more bracketed genres: [cinematic orchestral + trap beats + ethereal female vocals]
     - Specify fusion ratio: "70% jazz harmony, 30% electronic production"

   Temporal prompting (for models supporting duration/time):
     - "0:00–0:30 ambient intro; 0:30–1:00 beat drops with bass; 1:00–1:30 chorus peak"

   Reference stacking:
     - "Production style of 1970s analog soul + melodic structure of modern K-pop + vocal delivery of Adele"

   Emotional trajectory:
     - "Starts hopeful and bright, shifts to introspective in verse, resolves to bittersweet acceptance in outro"

   Spatial and immersive audio:
     - "binaural recording, 360° spatial audio, sounds move from behind to front, overhead rain"

------------------------------------------------------------------
Sources: Suno AI official community guides (2025–2026), Udio documentation (2026),
         ElevenLabs prompt-engineering docs (2026), Stable Audio 3 release notes (2026),
         naqashmunir21/awesome-suno-prompts community taxonomy (2026),
         music-production best practices adapted for generative-AI workflows.
