DSL Extraction: How It Works

DSL (Domain-Specific Language) extraction is the core innovation of Mosaic Builder. It's the process of transforming natural language descriptions into structured application models that can generate working code.

What is DSL Extraction?

DSL extraction is the intelligent process of:

Parsing natural language for domain concepts

Identifying entities, relationships, and business logic

Structuring information into a formal model

Tracking confidence and completeness

Generating code from the structured model

The Extraction Pipeline

graph LR
    A[User Message] --> B[LLM Analysis]
    B --> C[Entity Detection]
    C --> D[Relationship Mapping]
    D --> E[Field Inference]
    E --> F[DSL Update]
    F --> G[Confidence Scoring]
    G --> H[Phase Transition]

1. Message Analysis

When you send a message like: > "I need a blog where users can write posts and readers can comment"

The system performs multiple analyses:

interface MessageAnalysis {
  // What entities are mentioned?
  entities: ['User', 'Post', 'Comment', 'Reader']
  
  // What actions are described?
  actions: ['write', 'comment']
  
  // What relationships exist?
  relationships: [
    'User writes Post',
    'Reader comments on Post'
  ]
  
  // What's the intent?
  intent: 'CREATE_BLOG_SYSTEM'
  
  // How confident are we?
  confidence: 0.85
}

2. Entity Detection

Entities are the core objects in your domain. The extractor identifies:

#### Explicit Entities

• Direct mentions: "users", "posts", "comments"

• Named objects: "product", "order", "customer"

• Domain terms: "invoice", "inventory", "employee"

#### Implicit Entities

• Derived from actions: "login" → User, Session

• From relationships: "author of post" → Author entity

• From fields: "category name" → Category entity

#### Entity Confidence Scoring

interface EntityConfidence {
  entity: string
  confidence: number  // 0-1
  source: 'explicit' | 'implicit' | 'inferred'
  reasoning: string
}

// Example:
{
  entity: 'Post',
  confidence: 0.95,
  source: 'explicit',
  reasoning: 'Directly mentioned as "posts" in user message'
}

3. Relationship Extraction

Relationships define how entities connect:

#### Relationship Types

• One-to-One: User has one Profile

• One-to-Many: User has many Posts

• Many-to-Many: Posts have many Tags

#### Extraction Patterns The system recognizes relationship indicators:

• Ownership: "users have posts" → One-to-Many

• Belonging: "comment belongs to post" → Many-to-One

• Association: "posts are tagged" → Many-to-Many

#### Bidirectional Understanding

// User says: "Posts have comments"
// System understands:
{
  forward: 'Post has many Comments',
  reverse: 'Comment belongs to one Post',
  type: 'one-to-many',
  required: true  // Comment must have a Post
}

4. Field Inference

Fields are automatically inferred from context:

#### Smart Field Detection

interface FieldInference {
  // From entity name
  'User' → email, password, name
  'Post' → title, content, publishedAt
  'Product' → name, price, description
  
  // From relationships
  'Post belongs to User' → userId field
  
  // From business rules
  'Posts can be drafted' → status field
  'Products have inventory' → quantity field
  
  // From common patterns
  'Audit trail needed' → createdAt, updatedAt
}

#### Type Inference

// System infers types from field names and context:
{
  email: 'string' + unique constraint,
  price: 'decimal' + positive constraint,
  publishedAt: 'datetime' + nullable,
  status: 'enum' with values from context,
  description: 'text' for long content
}

The DSL Structure

Core DSL Format

interface DSLContext {
  // Domain entities
  entities: Record<string, Entity>
  
  // Entity relationships
  relationships: Relationship[]
  
  // UI pages/screens
  pages: Record<string, Page>
  
  // Business workflows
  workflows: Record<string, Workflow>
  
  // Security
  authentication: AuthConfig | null
  permissions: PermissionRule[]
  
  // Metadata
  appMetadata: {
    name: string
    description: string
    techStack: TechStackConfig
  }
  
  // Extraction state
  phase: Phase
  readiness: number  // 0-100
  confidence: number // 0-1
}

Entity Structure

interface Entity {
  name: string
  fields: Record<string, Field>
  relationships?: EntityRelationship[]
  validation?: ValidationRule[]
  indexes?: Index[]
  
  // Tracking
  confidence: number
  source: {
    messageId: string
    reasoning: string
  }
}

interface Field {
  type: FieldType
  required: boolean
  unique?: boolean
  default?: any
  validation?: ValidationRule
  
  // For enums
  enumValues?: string[]
  
  // For relations
  references?: string
}

Extraction Strategies

Progressive Enhancement

Start simple, add complexity:

// First pass: Basic structure
{
  entities: {
    Post: { fields: { title: {...} } }
  }
}

// Second pass: Add relationships
{
  relationships: [
    { from: 'Post', to: 'User', type: 'many-to-one' }
  ]
}

// Third pass: Add business logic
{
  workflows: {
    publishPost: { ... }
  }
}

Context Accumulation

Each message builds on previous understanding:

class DSLExtractor {
  private context: DSLContext
  private messageHistory: Message[]
  
  async processMessage(message: string) {
    // Consider full conversation context
    const fullContext = this.buildContext()
    
    // Extract new information
    const extraction = await this.extract(message, fullContext)
    
    // Merge with existing DSL
    this.context = this.merge(this.context, extraction)
    
    // Update confidence based on consistency
    this.updateConfidence()
  }
}

Conflict Resolution

When new information conflicts:

interface ConflictResolution {
  strategy: 'ask' | 'merge' | 'override' | 'ignore'
  
  // Ask user for clarification
  ask: () => "Did you mean X or Y?"
  
  // Merge both interpretations
  merge: () => combineInterpretations()
  
  // Override with higher confidence
  override: () => useHigherConfidence()
  
  // Ignore if low confidence
  ignore: () => keepExisting()
}

Advanced Extraction Features

Business Logic Detection

// From: "Only admins can delete posts"
{
  permissions: [
    {
      resource: 'Post',
      action: 'delete',
      condition: 'user.role === "admin"'
    }
  ]
}

Workflow Extraction

// From: "When order is placed, send email and update inventory"
{
  workflows: {
    placeOrder: {
      trigger: 'Order.created',
      steps: [
        { action: 'sendEmail', params: {...} },
        { action: 'updateInventory', params: {...} }
      ]
    }
  }
}

UI Inference

// From: "Dashboard showing sales metrics"
{
  pages: {
    dashboard: {
      type: 'dashboard',
      components: [
        { type: 'metric', data: 'sales' },
        { type: 'chart', data: 'revenue' }
      ]
    }
  }
}

Quality Metrics

Extraction Confidence

interface ConfidenceFactors {
  // How explicit was the mention?
  explicitness: 0.0 - 1.0
  
  // How consistent with context?
  consistency: 0.0 - 1.0
  
  // How complete is the information?
  completeness: 0.0 - 1.0
  
  // Overall confidence
  overall: weighted_average
}

Readiness Calculation

function calculateReadiness(dsl: DSLContext): number {
  const factors = {
    // Do all entities have fields?
    entitiesComplete: checkEntityCompleteness(dsl),
    
    // Are relationships defined?
    relationshipsDefined: checkRelationships(dsl),
    
    // Is authentication configured?
    authConfigured: checkAuth(dsl),
    
    // Are there enough details?
    detailLevel: checkDetailLevel(dsl)
  }
  
  return weightedAverage(factors)
}

Best Practices for Better Extraction

For Users

#### Be Specific ✅ "Users have email, password, and profile photo" ❌ "Users have some fields"

#### Describe Relationships Clearly ✅ "Each project has many tasks" ❌ "Projects and tasks are related"

#### Include Business Rules ✅ "Only task assignee can mark as complete" ❌ "Add some permissions"

For Developers

#### Extraction Hints

// Add hints to improve extraction
const EXTRACTION_HINTS = {
  entities: {
    'User': ['email', 'password', 'role'],
    'Post': ['title', 'content', 'status']
  },
  relationships: {
    'ownership': 'one-to-many',
    'membership': 'many-to-many'
  }
}

#### Custom Extractors

// Register domain-specific extractors
registerExtractor('ecommerce', {
  entities: ['Product', 'Order', 'Customer'],
  patterns: [
    /shopping cart/i → 'Cart entity',
    /checkout/i → 'Order workflow'
  ]
})

Debugging Extraction

View Extraction Logs

// Enable debug mode
DSL_DEBUG=true pnpm dev

// Logs show:
[EXTRACT] Detected entity: Post (confidence: 0.92)
[EXTRACT] Inferred field: title (type: string)
[EXTRACT] Found relationship: Post → User (many-to-one)
[EXTRACT] Phase transition: discovering → clarifying

Extraction Trace

interface ExtractionTrace {
  input: string
  tokens: Token[]
  entities: EntityDetection[]
  relationships: RelationshipDetection[]
  fields: FieldInference[]
  conflicts: Conflict[]
  resolution: DSLDelta
}

Future Enhancements

Planned Features

• Visual DSL Editor: Drag-drop refinement

• Extraction Templates: Pre-built patterns

• Multi-language Support: Extract from any language

• Code-to-DSL: Reverse engineer existing code

• DSL Versioning: Track DSL evolution

• Collaborative Extraction: Team-based refinement