Discovering SyGra 2.0.0: A New Era of Synthetic Data Generation with Studio
The digital landscape is constantly evolving, offering fresh tools that simplify complex processes. Enter SyGra 2.0.0—the latest release that’s redefined synthetic data generation through its innovative feature, Studio. This interactive environment transforms the way users interact with synthetic data, making it an intuitive, visual craft rather than a chore of managing YAML files and terminals.
Why Choose SyGra Studio for Synthetic Data?
At the heart of SyGra’s appeal is Studio, designed to make synthetic data workflows not just easy but visually engaging and transparent. Users can create, preview, and execute data generation flows all from a single, straightforward interface. Imagine composing data flows on a canvas, easily tweaking prompts, and watching process executions unfold in real-time—all without diving into the complexities of code.
What Can You Do in Studio?
The functionalities of Studio are substantial and varied. Here’s a closer look at what this powerful tool offers:
-
Guided Model Configuration: Easily configure and validate various models like OpenAI, Azure OpenAI, and Ollama through handy guided forms.
-
Seamless Data Source Connectivity: Connect data sources from Hugging Face, ServiceNow, or your own file system, and preview sample rows before executing your workflow.
-
Node Configuration: Choose models, craft prompts (with helpful auto-suggested variables), and define structured output schemas efficiently.
-
Designing Downstream Outputs: Use shared state variables to design your outputs, bolstered by Pydantic for structured mappings.
-
End-to-End Execution: Execute your flows and immediately review generated results with an intuitive node-level progress tracker.
-
Comprehensive Debugging Tools: Utilize inline logs, breakpoints, and a Monaco-backed code editor for a streamlined debugging experience.
- Execution Monitoring: Keep track of token costs, latency, and outcomes with per-run execution history conveniently stored in
.executions/.
Step-by-Step Experience with SyGra Studio
Step 1: Configure the Data Source
To kick things off in Studio, simply click Create Flow. Automatic generation of Start and End nodes sets the stage. Here’s how to configure your data source:
- Select a connector from Hugging Face, disk, or ServiceNow.
- Input necessary parameters such as
repo_id, split, or file path, then hit Preview to fetch sample rows. - Your column names are auto-generated as state variables (like
{prompt}and{genre}), offering clarity on what can be used in prompts and processors.
Once everything is validated, Studio keeps configurations in sync, removing any need for manual wiring.
Step 2: Build the Flow Visually
With your data source configured, it’s time to visually create your flow. For instance, consider a story-generation pipeline:
-
Drop an LLM node titled “Story Generator,” choose a model like
gpt-4o-mini, and craft your prompt while saving the result tostory_body. -
Add another LLM node called “Story Summarizer.” Reference
{story_body}in the prompt and define your output asstory_summary. - Optionally, you can toggle structured outputs or insert additional tools and nodes for more complex logic.
Studio’s detail panel keeps everything well organized, enabling easy reference of model parameters, prompts, and tool configurations. Instantly access state variables by typing { in your prompts.
Step 3: Review and Execute
As you build, the Code Panel provides access to the generated YAML/JSON configuration. You can verify what’s produced before committing. When you’re ready to run the flow, follow these steps:
- Click Run Workflow.
- Set your desired record counts, batch sizes, and retry behavior.
- Hit Run and enjoy watching real-time progress details stream in the Execution panel, which includes token usage, latency, and costs.
After running your workflow, you have options to download outputs and compare results against previous executions, gaining insights into latency and usage metrics.
Running Existing Workflows
SyGra Studio is also capable of executing existing workflows located in the tasks. For example, you can run the Glaive Code Assistant workflow. This workflow utilizes the glaiveai/glaive-code-assistant-v2 dataset to draft and critique answers in a loop until satisfactory feedback is received.
Inside Studio, you will appreciate:
-
Canvas Layout: Visual representation of LLM nodes (
generate_answerandcritique_answer) connected by flexible conditional edges. -
Tunable Inputs: Flexibility to adjust dataset splits, batch sizes, and temperatures without the headache of YAML syntax.
-
Observable Execution: Live monitoring of both nodes, with insights into critiques and status updates during execution.
- Synthetic Outputs: Generated data is ready for training, evaluation, or annotation.
Getting Started with SyGra
Ready to dive in? You can get started with a few simple commands:
bash
git clone https://github.com/ServiceNow/SyGra.git
cd SyGra && make studio
With SyGra Studio, transforming synthetic data workflows into an intuitive, user-friendly experience has never been easier. Configure once, build with confidence, and run with clarity—all from your unique digital canvas.
Inspired by: Source

