DSL Extraction: How It Works
DSL (Domain-Specific Language) extraction is the core innovation of Mosaic Builder. It's the process of transforming natural language descriptions into structured application models that can generate working code.
What is DSL Extraction?
DSL extraction is the intelligent process of:
The Extraction Pipeline
graph LR
A[User Message] --> B[LLM Analysis]
B --> C[Entity Detection]
C --> D[Relationship Mapping]
D --> E[Field Inference]
E --> F[DSL Update]
F --> G[Confidence Scoring]
G --> H[Phase Transition]
1. Message Analysis
When you send a message like: > "I need a blog where users can write posts and readers can comment"
The system performs multiple analyses:
interface MessageAnalysis {
// What entities are mentioned?
entities: ['User', 'Post', 'Comment', 'Reader']
// What actions are described?
actions: ['write', 'comment']
// What relationships exist?
relationships: [
'User writes Post',
'Reader comments on Post'
]
// What's the intent?
intent: 'CREATE_BLOG_SYSTEM'
// How confident are we?
confidence: 0.85
}
2. Entity Detection
Entities are the core objects in your domain. The extractor identifies:
#### Explicit Entities
#### Implicit Entities
#### Entity Confidence Scoring
interface EntityConfidence {
entity: string
confidence: number // 0-1
source: 'explicit' | 'implicit' | 'inferred'
reasoning: string
}
// Example:
{
entity: 'Post',
confidence: 0.95,
source: 'explicit',
reasoning: 'Directly mentioned as "posts" in user message'
}
3. Relationship Extraction
Relationships define how entities connect:
#### Relationship Types
#### Extraction Patterns The system recognizes relationship indicators:
#### Bidirectional Understanding
// User says: "Posts have comments"
// System understands:
{
forward: 'Post has many Comments',
reverse: 'Comment belongs to one Post',
type: 'one-to-many',
required: true // Comment must have a Post
}
4. Field Inference
Fields are automatically inferred from context:
#### Smart Field Detection
interface FieldInference {
// From entity name
'User' → email, password, name
'Post' → title, content, publishedAt
'Product' → name, price, description
// From relationships
'Post belongs to User' → userId field
// From business rules
'Posts can be drafted' → status field
'Products have inventory' → quantity field
// From common patterns
'Audit trail needed' → createdAt, updatedAt
}
#### Type Inference
// System infers types from field names and context:
{
email: 'string' + unique constraint,
price: 'decimal' + positive constraint,
publishedAt: 'datetime' + nullable,
status: 'enum' with values from context,
description: 'text' for long content
}
The DSL Structure
Core DSL Format
interface DSLContext {
// Domain entities
entities: Record<string, Entity>
// Entity relationships
relationships: Relationship[]
// UI pages/screens
pages: Record<string, Page>
// Business workflows
workflows: Record<string, Workflow>
// Security
authentication: AuthConfig | null
permissions: PermissionRule[]
// Metadata
appMetadata: {
name: string
description: string
techStack: TechStackConfig
}
// Extraction state
phase: Phase
readiness: number // 0-100
confidence: number // 0-1
}
Entity Structure
interface Entity {
name: string
fields: Record<string, Field>
relationships?: EntityRelationship[]
validation?: ValidationRule[]
indexes?: Index[]
// Tracking
confidence: number
source: {
messageId: string
reasoning: string
}
}
interface Field {
type: FieldType
required: boolean
unique?: boolean
default?: any
validation?: ValidationRule
// For enums
enumValues?: string[]
// For relations
references?: string
}
Extraction Strategies
Progressive Enhancement
Start simple, add complexity:// First pass: Basic structure
{
entities: {
Post: { fields: { title: {...} } }
}
}
// Second pass: Add relationships
{
relationships: [
{ from: 'Post', to: 'User', type: 'many-to-one' }
]
}
// Third pass: Add business logic
{
workflows: {
publishPost: { ... }
}
}
Context Accumulation
Each message builds on previous understanding:class DSLExtractor {
private context: DSLContext
private messageHistory: Message[]
async processMessage(message: string) {
// Consider full conversation context
const fullContext = this.buildContext()
// Extract new information
const extraction = await this.extract(message, fullContext)
// Merge with existing DSL
this.context = this.merge(this.context, extraction)
// Update confidence based on consistency
this.updateConfidence()
}
}
Conflict Resolution
When new information conflicts:interface ConflictResolution {
strategy: 'ask' | 'merge' | 'override' | 'ignore'
// Ask user for clarification
ask: () => "Did you mean X or Y?"
// Merge both interpretations
merge: () => combineInterpretations()
// Override with higher confidence
override: () => useHigherConfidence()
// Ignore if low confidence
ignore: () => keepExisting()
}
Advanced Extraction Features
Business Logic Detection
// From: "Only admins can delete posts"
{
permissions: [
{
resource: 'Post',
action: 'delete',
condition: 'user.role === "admin"'
}
]
}
Workflow Extraction
// From: "When order is placed, send email and update inventory"
{
workflows: {
placeOrder: {
trigger: 'Order.created',
steps: [
{ action: 'sendEmail', params: {...} },
{ action: 'updateInventory', params: {...} }
]
}
}
}
UI Inference
// From: "Dashboard showing sales metrics"
{
pages: {
dashboard: {
type: 'dashboard',
components: [
{ type: 'metric', data: 'sales' },
{ type: 'chart', data: 'revenue' }
]
}
}
}
Quality Metrics
Extraction Confidence
interface ConfidenceFactors {
// How explicit was the mention?
explicitness: 0.0 - 1.0
// How consistent with context?
consistency: 0.0 - 1.0
// How complete is the information?
completeness: 0.0 - 1.0
// Overall confidence
overall: weighted_average
}
Readiness Calculation
function calculateReadiness(dsl: DSLContext): number {
const factors = {
// Do all entities have fields?
entitiesComplete: checkEntityCompleteness(dsl),
// Are relationships defined?
relationshipsDefined: checkRelationships(dsl),
// Is authentication configured?
authConfigured: checkAuth(dsl),
// Are there enough details?
detailLevel: checkDetailLevel(dsl)
}
return weightedAverage(factors)
}
Best Practices for Better Extraction
For Users
#### Be Specific ✅ "Users have email, password, and profile photo" ❌ "Users have some fields"
#### Describe Relationships Clearly ✅ "Each project has many tasks" ❌ "Projects and tasks are related"
#### Include Business Rules ✅ "Only task assignee can mark as complete" ❌ "Add some permissions"
For Developers
#### Extraction Hints
// Add hints to improve extraction
const EXTRACTION_HINTS = {
entities: {
'User': ['email', 'password', 'role'],
'Post': ['title', 'content', 'status']
},
relationships: {
'ownership': 'one-to-many',
'membership': 'many-to-many'
}
}
#### Custom Extractors
// Register domain-specific extractors
registerExtractor('ecommerce', {
entities: ['Product', 'Order', 'Customer'],
patterns: [
/shopping cart/i → 'Cart entity',
/checkout/i → 'Order workflow'
]
})
Debugging Extraction
View Extraction Logs
// Enable debug mode
DSL_DEBUG=true pnpm dev
// Logs show:
[EXTRACT] Detected entity: Post (confidence: 0.92)
[EXTRACT] Inferred field: title (type: string)
[EXTRACT] Found relationship: Post → User (many-to-one)
[EXTRACT] Phase transition: discovering → clarifying
Extraction Trace
interface ExtractionTrace {
input: string
tokens: Token[]
entities: EntityDetection[]
relationships: RelationshipDetection[]
fields: FieldInference[]
conflicts: Conflict[]
resolution: DSLDelta
}