AI Workflow SOP
Higgsfield & HeyGen Production Pipeline
1. Purpose of This SOP
This SOP documents our current AI-based image and video production workflow, primarily using Higgsfield and HeyGen, to ensure consistency, quality, and cost control across projects.
AI tools are evolving rapidly, and platforms, models, and capabilities will continue to change. While specific tools or settings may be updated over time, the core principles of our workflow, planning first, controlled experimentation, quality checks, and cost awareness, remain constant.
This document focuses on those principles, along with our current best practices.
This SOP is intended for:
New editors and AI artists joining the team
Internal team members working on AI-assisted projects
Academy students learning real-world production workflows
2. What Is the “Higgsfield Workflow” (High-Level Overview)
The Higgsfield workflow refers to our structured approach to generating AI-based images and videos using Higgsfield as a primary all-in-one generation platform.
Why Higgsfield?
We chose Higgsfield because:
It combines image generation and video generation in a single platform
It supports multiple high-quality models under one subscription
It allows faster iteration compared to managing multiple separate tools
It fits well into a production pipeline where AI outputs are later refined in tools like After Effects
Higgsfield is not used in isolation. It sits within a broader production pipeline that may include:
Stock footage
Motion graphics
Manual compositing and editing
Voiceover and spokesperson generation (via HeyGen)
The workflow is designed to support production, not replace creative or editorial decision-making.
3. When We Decide to Use AI
We use AI only when it makes practical sense for the project.
Primary trigger
When the client explicitly asks for AI-based visuals or videos
Additional conditions
AI usage must be:
Clearly defined (the client’s expectation is understandable and achievable)
Practical within current AI limitations
Aligned with project timelines and quality standards
If the request is vague or unrealistic, we clarify the scope before generating.
AI is not used as a guessing tool.
We avoid AI when:
Stock footage or motion design can achieve better control
Visual accuracy is critical, and AI inconsistencies would cause delays
The output requires precise typography, branding, or layouts that AI cannot reliably deliver
4. Tools We Use (Current)
4.1 Higgsfield (Image & Video Generation)
Higgsfield is our primary AI tool for:
Concept images
Start and end frames for AI videos
Short AI-generated video sequences
It supports multiple internal models with varying quality, speed, and credit cost.

4.2 HeyGen (Spokesperson Videos)
HeyGen is used specifically for:
AI spokesperson videos
Talking-head or presenter-style content
Scenarios where a human presence is required but live shooting is not feasible
HeyGen has its own workflow, separate from Higgsfield:
Script preparation is critical
Visual customization is limited compared to manual shoots
Outputs are often combined later with motion graphics or UI animations
HeyGen is used only when a spokesperson format is clearly required.

5. Core Production Workflow (Step-by-Step)
This workflow applies mainly to Higgsfield-based projects.
Step 1: Concept & Intent Clarity
Before any AI generation:
Define what the shot must communicate
Decide whether AI is used for:
Atmosphere
Illustration
Abstract visuals
Product-style representation
Identify if the output is a final asset or an intermediate asset for further editing
Step 2: Prompt Preparation (Critical Step)
Prompt preparation is the most important part of our AI workflow.
Most quality issues and credit waste come from weak or unclear prompts.
5.2.1 Official References & Learning
Always refer to Higgsfield’s official tutorial channel https://www.youtube.com/watch?v=AdjllfZuqYM for:
Model behavior
Prompt structure
New features and limitations
If you are not confident in writing prompts, use GPT to:
Refine your idea
Structure complex prompts
Convert visual intent into technical descriptions
GPT is a support tool, not a replacement for understanding the scene.
5.2.2 Prompting for Single Images vs Sequences
Generating one image at a time works fine for:
Standalone visuals
Mood shots
Abstract or atmospheric scenes
However, this approach fails for sequences, such as:
Story-driven visuals
Product showcases
Consistent characters
Multi-shot video scenes
Why?
Each new generation subtly changes:
Backgrounds
Lighting
Product details
Facial structure
This breaks visual continuity and makes sequence videos unusable.
5.2.3 About Higgsfield’s Popcorn Storyboard Feature
Higgsfield provides a Popcorn Storyboard feature that:
Accepts reference images
Generates scene-based sequences from descriptions
While useful in theory, in practice:
It has creative limitations
It frequently flags NSFW (Not Safe For Work) errors
Even for harmless details
This disrupts production flow and wastes time
Because of this, Popcorn Storyboard is not reliable for consistent production work.
5.2.4 Our Proven Method for Consistent Sequences (Recommended)
For consistency in story, product, or character development, we employ the following manual and reliable approach.
Step A: Generate a Cinematic Contact Sheet (3×3 Grid)
Instead of generating separate images, we:
Use the normal image generation page
Select a preferred model (commonly Nano Banana or Nano Banana Pro)
Generate a single 3×3 cinematic storyboard grid
This forces the AI to:
Lock the subject
Maintain environmental consistency
Preserve lighting, colors, and proportions
Example Prompt Structure (Reference Prompt)
Purpose: Generate 9 consistent cinematic shots of the same subject(s) in one environment.
Analyze the entire movie scene. Identify ALL key subjects present (whether it's a single person,
a group/couple, a vehicle, or a specific object) and their spatial relationship/interaction.
Generate a cohesive 3x3 grid "Cinematic Contact Sheet" featuring 9 distinct camera shots of
Exactly these subjects in the same environment.
You must adapt the standard cinematic shot types to fit the content:
- If a group, keep the group together
- If an object, frame the whole object
Row 1 (Establishing Context):
1. Extreme Long Shot (ELS)
2. Long Shot (LS)
3. Medium Long Shot (3/4 view)
Row 2 (Core Coverage):
4. Medium Shot (MS)
5. Medium Close-Up (MCU)
6. Close-Up (CU)
Row 3 (Details & Angles):
7. Extreme Close-Up (ECU)
8. Low Angle Shot
9. High Angle Shot
Ensure strict consistency:
- Same people or objects
- Same clothing/materials
- Same lighting and environment
Photorealistic textures, cinematic color grading, and realistic depth of field.
No repeated shots.
This grid becomes the visual backbone of the entire sequence.
5.2.5 Extracting High-Quality Frames from the Grid
Because the contact sheet is a single image:
Direct cropping may result in blur or quality loss
We use two reliable methods:
Method 1: AI-Based Frame Extraction
Re-upload the grid
Prompt example:
Extract a high-resolution frame from row 1, column 2.
Maintain original subject integrity, lighting, and proportions.
Repeat for all required frames.
Method 2: Crop + Enhance
Manually crop the desired frame
Upload it back into Higgsfield
Use enhancement prompt:
High-resolution upscale, extreme detail.
Enhance clarity and fine textures while maintaining the original subject’s integrity.
5.2.6 Moving to Video Generation
Once we have:
Clean
High-resolution
Consistent frames
Only then do we proceed to:
Video generation (Minimax / Kling)
Lower retries
Better continuity
Reduced credit waste
This workflow is mandatory for:
Sequence videos
Product stories
Character-driven visuals
5.2.7 Review & Selection
After generation:
Review for visual consistency
Check faces, hands, text, and layout
Select only usable outputs for further processing
Rejected outputs are part of the process but must be minimized.
6. HeyGen Spokesperson Video Workflow
This workflow is used for AI spokesperson / talking-head videos where a human presence is required without live shooting. HeyGen is used after visual preparation is completed.
Avatar Image Preparation (Higgsfield)
All HeyGen avatars start with a high-quality image generated in Higgsfield.
Requirements for the Avatar Image
Front-facing or slight 3/4 angle
Neutral facial expression
Clear facial features (eyes, lips, jawline)
No motion blur or extreme lighting
Simple or studio-style background
Poor input images result in poor lip sync and unnatural movement, so do not skip this step.
Once finalized, export the image at the highest available resolution.

Creating the Virtual Character in HeyGen
Open HeyGen
Go to Avatars
Select Create a Virtual Character
Upload the Higgsfield image
Let HeyGen process and generate the avatar
Quality Check
Facial proportions look natural
Mouth and eye alignment is correct
No visible distortions
If issues appear, regenerate the image in Higgsfield instead of forcing fixes in HeyGen.

Audio Strategy (Decide First)
Before building the video, decide which audio source will be used:
Option A: HeyGen AI Voice
Used for:
Internal demos
Fast drafts
Non-brand-critical videos
Option B: External Audio (Preferred for Final Delivery)
Used when:
Client provides voiceover
The brand requires prema ium voice quality
Specific tone, accent, or pronunciation is needed
External audio is usually generated using ElevenLabs or provided directly by the client.
External Audio Workflow (Client Audio or ElevenLabs)
Preparing the Audio
Format:
.mp3or.wavClean, noise-free audio
Natural pacing (not rushed)
Final script only (no placeholders)
If generating audio:
Use ElevenLabs
Select the correct voice and tone
Generate the final voiceover
Download the audio file
Uploading Audio & Lip Sync in HeyGen
Open the HeyGen project
Upload the external audio file
HeyGen will:
Automatically transcribe the audio
Apply lip sync to the avatar based on the audio
Important Rules
Do not regenerate voice inside HeyGen once external audio is uploaded
Always review:
Lip sync accuracy
Timing of pauses
Natural facial movement
If lip sync feels off:
Slightly adjust the pacing in ElevenLabs
Re-upload the audio
Avoid regenerating the avatar unless necessary

Scene & Layout Setup
Configure the visual layout:
Avatar position (center / left / right)
Background:
Solid color
Gradient
Brand background (if provided)
Ensure safe framing:
Head not touching edges
Adequate space for text overlays
This ensures compatibility with website embeds and post-production graphics.
Video Generation & Review
Generate the video
Review for:
Lip sync accuracy
Facial realism
Natural head movement
No glitches or freezes
If issues appear:
First adjust the audio
Then adjust script pacing
Regenerate only if necessary
Export & Post-Production
Export the final video from HeyGen
Import into:
After Effects
Premiere Pro
Or any editing software used in the project
Common Post-Production Tasks
Add lower thirds
Insert UI overlays
Add subtitles
Apply brand styling
HeyGen output is treated as a base layer, not the final master.
7. Credit Usage Awareness
AI generation directly impacts cost.
Key points:
Every regeneration consumes credits
Higher resolution and longer duration increase the cost
Most credit loss comes from repeated retries, not final outputs
Editors are expected to:
Be mindful of attempts
Avoid random experimentation
Treat credits as a production resource
8. Quality Challenges & AI Limitations
AI outputs are not perfect by default.
Common issues include:
Incorrect or unreadable text
Facial or hand distortions
Layout inconsistencies
Missing or altered elements
Because of this:
Multiple attempts are often required
AI outputs are reviewed carefully before use
AI visuals are frequently refined further in post-production
Understanding these limitations prevents unrealistic expectations.
9. Optimization & Best Practices
To maintain quality while controlling cost, we follow these rules:
Use AI only when needed
Separate draft exploration from final generation
Use lower resolutions for testing
Move to 1080p only for final delivery
Prepare prompts and references before generating
Avoid last-minute experimentation
Optimization is not optional; it is part of professional AI usage.
10. Conclusion
AI is evolving rapidly. New models, features, and workflows are being released almost every day. Because of this, our workflow is not fixed; it will continue to adapt as better tools and methods emerge.
For now, this SOP reflects our most practical and reliable approach using Higgsfield and HeyGen. We are also observing production houses building internal AI systems, using APIs, and setting up dedicated servers for automation and scalability. We are actively researching these directions and may move toward hybrid or internal solutions in the future.
Until then, our focus is simple:
Use AI strategically, control costs, maintain quality, and keep creative direction at the center of every project.