AI Workflow SOP

Higgsfield & HeyGen Production Pipeline

1. Purpose of This SOP

This SOP documents our current AI-based image and video production workflow, primarily using Higgsfield and HeyGen, to ensure consistency, quality, and cost control across projects.

AI tools are evolving rapidly, and platforms, models, and capabilities will continue to change. While specific tools or settings may be updated over time, the core principles of our workflow, planning first, controlled experimentation, quality checks, and cost awareness, remain constant.
This document focuses on those principles, along with our current best practices.

This SOP is intended for:

  • New editors and AI artists joining the team

  • Internal team members working on AI-assisted projects

  • Academy students learning real-world production workflows

2. What Is the “Higgsfield Workflow” (High-Level Overview)

The Higgsfield workflow refers to our structured approach to generating AI-based images and videos using Higgsfield as a primary all-in-one generation platform.

Why Higgsfield?

We chose Higgsfield because:

  • It combines image generation and video generation in a single platform

  • It supports multiple high-quality models under one subscription

  • It allows faster iteration compared to managing multiple separate tools

  • It fits well into a production pipeline where AI outputs are later refined in tools like After Effects

Higgsfield is not used in isolation. It sits within a broader production pipeline that may include:

  • Stock footage

  • Motion graphics

  • Manual compositing and editing

  • Voiceover and spokesperson generation (via HeyGen)

The workflow is designed to support production, not replace creative or editorial decision-making.

3. When We Decide to Use AI

We use AI only when it makes practical sense for the project.

Primary trigger

  • When the client explicitly asks for AI-based visuals or videos

Additional conditions

AI usage must be:

  • Clearly defined (the client’s expectation is understandable and achievable)

  • Practical within current AI limitations

  • Aligned with project timelines and quality standards

If the request is vague or unrealistic, we clarify the scope before generating.
AI is not used as a guessing tool.

We avoid AI when:

  • Stock footage or motion design can achieve better control

  • Visual accuracy is critical, and AI inconsistencies would cause delays

  • The output requires precise typography, branding, or layouts that AI cannot reliably deliver

4. Tools We Use (Current)

4.1 Higgsfield (Image & Video Generation)

Higgsfield is our primary AI tool for:

  • Concept images

  • Start and end frames for AI videos

  • Short AI-generated video sequences

It supports multiple internal models with varying quality, speed, and credit cost.


4.2 HeyGen (Spokesperson Videos)

HeyGen is used specifically for:

  • AI spokesperson videos

  • Talking-head or presenter-style content

  • Scenarios where a human presence is required but live shooting is not feasible

HeyGen has its own workflow, separate from Higgsfield:

  • Script preparation is critical

  • Visual customization is limited compared to manual shoots

  • Outputs are often combined later with motion graphics or UI animations

HeyGen is used only when a spokesperson format is clearly required.


5. Core Production Workflow (Step-by-Step)

This workflow applies mainly to Higgsfield-based projects.

Step 1: Concept & Intent Clarity

Before any AI generation:

  • Define what the shot must communicate

  • Decide whether AI is used for:

    • Atmosphere

    • Illustration

    • Abstract visuals

    • Product-style representation

  • Identify if the output is a final asset or an intermediate asset for further editing

Step 2: Prompt Preparation (Critical Step)

Prompt preparation is the most important part of our AI workflow.
Most quality issues and credit waste come from weak or unclear prompts.

5.2.1 Official References & Learning

  • Always refer to Higgsfield’s official tutorial channel https://www.youtube.com/watch?v=AdjllfZuqYM for:

    • Model behavior

    • Prompt structure

    • New features and limitations

  • If you are not confident in writing prompts, use GPT to:

    • Refine your idea

    • Structure complex prompts

    • Convert visual intent into technical descriptions

GPT is a support tool, not a replacement for understanding the scene.

5.2.2 Prompting for Single Images vs Sequences

Generating one image at a time works fine for:

  • Standalone visuals

  • Mood shots

  • Abstract or atmospheric scenes

However, this approach fails for sequences, such as:

  • Story-driven visuals

  • Product showcases

  • Consistent characters

  • Multi-shot video scenes

Why?

  • Each new generation subtly changes:

    • Backgrounds

    • Lighting

    • Product details

    • Facial structure

  • This breaks visual continuity and makes sequence videos unusable.

5.2.3 About Higgsfield’s Popcorn Storyboard Feature

Higgsfield provides a Popcorn Storyboard feature that:

  • Accepts reference images

  • Generates scene-based sequences from descriptions

While useful in theory, in practice:

  • It has creative limitations

  • It frequently flags NSFW (Not Safe For Work) errors

    • Even for harmless details

  • This disrupts production flow and wastes time

Because of this, Popcorn Storyboard is not reliable for consistent production work.

5.2.4 Our Proven Method for Consistent Sequences (Recommended)

For consistency in story, product, or character development, we employ the following manual and reliable approach.

Step A: Generate a Cinematic Contact Sheet (3×3 Grid)

Instead of generating separate images, we:

  • Use the normal image generation page

  • Select a preferred model (commonly Nano Banana or Nano Banana Pro)

  • Generate a single 3×3 cinematic storyboard grid

This forces the AI to:

  • Lock the subject

  • Maintain environmental consistency

  • Preserve lighting, colors, and proportions

Example Prompt Structure (Reference Prompt)

Purpose: Generate 9 consistent cinematic shots of the same subject(s) in one environment.

Analyze the entire movie scene. Identify ALL key subjects present (whether it's a single person, 

a group/couple, a vehicle, or a specific object) and their spatial relationship/interaction. 

Generate a cohesive 3x3 grid "Cinematic Contact Sheet" featuring 9 distinct camera shots of 

Exactly these subjects in the same environment.

You must adapt the standard cinematic shot types to fit the content:

- If a group, keep the group together

- If an object, frame the whole object

Row 1 (Establishing Context):

1. Extreme Long Shot (ELS)

2. Long Shot (LS)

3. Medium Long Shot (3/4 view)

Row 2 (Core Coverage):

4. Medium Shot (MS)

5. Medium Close-Up (MCU)

6. Close-Up (CU)

Row 3 (Details & Angles):

7. Extreme Close-Up (ECU)

8. Low Angle Shot

9. High Angle Shot

Ensure strict consistency:

- Same people or objects

- Same clothing/materials

- Same lighting and environment

Photorealistic textures, cinematic color grading, and realistic depth of field.

No repeated shots.

This grid becomes the visual backbone of the entire sequence.

5.2.5 Extracting High-Quality Frames from the Grid

Because the contact sheet is a single image:

  • Direct cropping may result in blur or quality loss

We use two reliable methods:

Method 1: AI-Based Frame Extraction

  • Re-upload the grid

  • Prompt example:

Extract a high-resolution frame from row 1, column 2.

Maintain original subject integrity, lighting, and proportions.

Repeat for all required frames.

Method 2: Crop + Enhance

  • Manually crop the desired frame

  • Upload it back into Higgsfield

  • Use enhancement prompt:

High-resolution upscale, extreme detail.

Enhance clarity and fine textures while maintaining the original subject’s integrity.

5.2.6 Moving to Video Generation

Once we have:

  • Clean

  • High-resolution

  • Consistent frames

Only then do we proceed to:

  • Video generation (Minimax / Kling)

  • Lower retries

  • Better continuity

  • Reduced credit waste

This workflow is mandatory for:

  • Sequence videos

  • Product stories

Character-driven visuals

5.2.7 Review & Selection

After generation:

  • Review for visual consistency

  • Check faces, hands, text, and layout

  • Select only usable outputs for further processing

Rejected outputs are part of the process but must be minimized.

6. HeyGen Spokesperson Video Workflow

This workflow is used for AI spokesperson / talking-head videos where a human presence is required without live shooting. HeyGen is used after visual preparation is completed.

Avatar Image Preparation (Higgsfield)

All HeyGen avatars start with a high-quality image generated in Higgsfield.

Requirements for the Avatar Image

  • Front-facing or slight 3/4 angle

  • Neutral facial expression

  • Clear facial features (eyes, lips, jawline)

  • No motion blur or extreme lighting

  • Simple or studio-style background

Poor input images result in poor lip sync and unnatural movement, so do not skip this step.

Once finalized, export the image at the highest available resolution.


Creating the Virtual Character in HeyGen

  1. Open HeyGen

  2. Go to Avatars

  3. Select Create a Virtual Character

  4. Upload the Higgsfield image

  5. Let HeyGen process and generate the avatar

Quality Check

  • Facial proportions look natural

  • Mouth and eye alignment is correct

  • No visible distortions

If issues appear, regenerate the image in Higgsfield instead of forcing fixes in HeyGen.

Audio Strategy (Decide First)

Before building the video, decide which audio source will be used:

Option A: HeyGen AI Voice

Used for:

  • Internal demos

  • Fast drafts

  • Non-brand-critical videos

Option B: External Audio (Preferred for Final Delivery)

Used when:

  • Client provides voiceover

  • The brand requires prema ium voice quality

  • Specific tone, accent, or pronunciation is needed

External audio is usually generated using ElevenLabs or provided directly by the client.

External Audio Workflow (Client Audio or ElevenLabs)

Preparing the Audio

  • Format: .mp3 or .wav

  • Clean, noise-free audio

  • Natural pacing (not rushed)

  • Final script only (no placeholders)

If generating audio:

  1. Use ElevenLabs

  2. Select the correct voice and tone

  3. Generate the final voiceover

  4. Download the audio file

Uploading Audio & Lip Sync in HeyGen

  1. Open the HeyGen project

  2. Upload the external audio file

  3. HeyGen will:

    • Automatically transcribe the audio

    • Apply lip sync to the avatar based on the audio

Important Rules

  • Do not regenerate voice inside HeyGen once external audio is uploaded

  • Always review:

    • Lip sync accuracy

    • Timing of pauses

    • Natural facial movement

If lip sync feels off:

  • Slightly adjust the pacing in ElevenLabs

  • Re-upload the audio

  • Avoid regenerating the avatar unless necessary


Scene & Layout Setup

Configure the visual layout:

  • Avatar position (center / left / right)

  • Background:

    • Solid color

    • Gradient

    • Brand background (if provided)

Ensure safe framing:

  • Head not touching edges

  • Adequate space for text overlays

This ensures compatibility with website embeds and post-production graphics.

Video Generation & Review

  1. Generate the video

  2. Review for:


    • Lip sync accuracy

    • Facial realism

    • Natural head movement

    • No glitches or freezes

If issues appear:

  • First adjust the audio

  • Then adjust script pacing

  • Regenerate only if necessary

Export & Post-Production

  1. Export the final video from HeyGen

  2. Import into:

    • After Effects

    • Premiere Pro

    • Or any editing software used in the project

Common Post-Production Tasks

  • Add lower thirds

  • Insert UI overlays

  • Add subtitles

  • Apply brand styling

HeyGen output is treated as a base layer, not the final master.

7. Credit Usage Awareness

AI generation directly impacts cost.

Key points:

  • Every regeneration consumes credits

  • Higher resolution and longer duration increase the cost

  • Most credit loss comes from repeated retries, not final outputs

Editors are expected to:

  • Be mindful of attempts

  • Avoid random experimentation

  • Treat credits as a production resource

8. Quality Challenges & AI Limitations

AI outputs are not perfect by default.

Common issues include:

  • Incorrect or unreadable text

  • Facial or hand distortions

  • Layout inconsistencies

  • Missing or altered elements

Because of this:

  • Multiple attempts are often required

  • AI outputs are reviewed carefully before use

  • AI visuals are frequently refined further in post-production

Understanding these limitations prevents unrealistic expectations.

9. Optimization & Best Practices

To maintain quality while controlling cost, we follow these rules:

  • Use AI only when needed

  • Separate draft exploration from final generation

  • Use lower resolutions for testing

  • Move to 1080p only for final delivery

  • Prepare prompts and references before generating

  • Avoid last-minute experimentation

Optimization is not optional; it is part of professional AI usage.

10. Conclusion

AI is evolving rapidly. New models, features, and workflows are being released almost every day. Because of this, our workflow is not fixed; it will continue to adapt as better tools and methods emerge.

For now, this SOP reflects our most practical and reliable approach using Higgsfield and HeyGen. We are also observing production houses building internal AI systems, using APIs, and setting up dedicated servers for automation and scalability. We are actively researching these directions and may move toward hybrid or internal solutions in the future.

Until then, our focus is simple:
Use AI strategically, control costs, maintain quality, and keep creative direction at the center of every project.