DSL Extraction: How It Works

DSL (Domain-Specific Language) extraction is the core innovation of Mosaic Builder. It's the process of transforming natural language descriptions into structured application models that can generate working code.

What is DSL Extraction?

DSL extraction is the intelligent process of:

  • Parsing natural language for domain concepts
  • Identifying entities, relationships, and business logic
  • Structuring information into a formal model
  • Tracking confidence and completeness
  • Generating code from the structured model
  • The Extraction Pipeline

    graph LR
        A[User Message] --> B[LLM Analysis]
        B --> C[Entity Detection]
        C --> D[Relationship Mapping]
        D --> E[Field Inference]
        E --> F[DSL Update]
        F --> G[Confidence Scoring]
        G --> H[Phase Transition]

    1. Message Analysis

    When you send a message like: > "I need a blog where users can write posts and readers can comment"

    The system performs multiple analyses:

    interface MessageAnalysis {
      // What entities are mentioned?
      entities: ['User', 'Post', 'Comment', 'Reader']
      
      // What actions are described?
      actions: ['write', 'comment']
      
      // What relationships exist?
      relationships: [
        'User writes Post',
        'Reader comments on Post'
      ]
      
      // What's the intent?
      intent: 'CREATE_BLOG_SYSTEM'
      
      // How confident are we?
      confidence: 0.85
    }

    2. Entity Detection

    Entities are the core objects in your domain. The extractor identifies:

    #### Explicit Entities

  • • Direct mentions: "users", "posts", "comments"
  • • Named objects: "product", "order", "customer"
  • • Domain terms: "invoice", "inventory", "employee"
  • #### Implicit Entities

  • • Derived from actions: "login" → User, Session
  • • From relationships: "author of post" → Author entity
  • • From fields: "category name" → Category entity
  • #### Entity Confidence Scoring

    interface EntityConfidence {
      entity: string
      confidence: number  // 0-1
      source: 'explicit' | 'implicit' | 'inferred'
      reasoning: string
    }
    
    // Example:
    {
      entity: 'Post',
      confidence: 0.95,
      source: 'explicit',
      reasoning: 'Directly mentioned as "posts" in user message'
    }

    3. Relationship Extraction

    Relationships define how entities connect:

    #### Relationship Types

  • One-to-One: User has one Profile
  • One-to-Many: User has many Posts
  • Many-to-Many: Posts have many Tags
  • #### Extraction Patterns The system recognizes relationship indicators:

  • • Ownership: "users have posts" → One-to-Many
  • • Belonging: "comment belongs to post" → Many-to-One
  • • Association: "posts are tagged" → Many-to-Many
  • #### Bidirectional Understanding

    // User says: "Posts have comments"
    // System understands:
    {
      forward: 'Post has many Comments',
      reverse: 'Comment belongs to one Post',
      type: 'one-to-many',
      required: true  // Comment must have a Post
    }

    4. Field Inference

    Fields are automatically inferred from context:

    #### Smart Field Detection

    interface FieldInference {
      // From entity name
      'User' → email, password, name
      'Post' → title, content, publishedAt
      'Product' → name, price, description
      
      // From relationships
      'Post belongs to User' → userId field
      
      // From business rules
      'Posts can be drafted' → status field
      'Products have inventory' → quantity field
      
      // From common patterns
      'Audit trail needed' → createdAt, updatedAt
    }

    #### Type Inference

    // System infers types from field names and context:
    {
      email: 'string' + unique constraint,
      price: 'decimal' + positive constraint,
      publishedAt: 'datetime' + nullable,
      status: 'enum' with values from context,
      description: 'text' for long content
    }

    The DSL Structure

    Core DSL Format

    interface DSLContext {
      // Domain entities
      entities: Record<string, Entity>
      
      // Entity relationships
      relationships: Relationship[]
      
      // UI pages/screens
      pages: Record<string, Page>
      
      // Business workflows
      workflows: Record<string, Workflow>
      
      // Security
      authentication: AuthConfig | null
      permissions: PermissionRule[]
      
      // Metadata
      appMetadata: {
        name: string
        description: string
        techStack: TechStackConfig
      }
      
      // Extraction state
      phase: Phase
      readiness: number  // 0-100
      confidence: number // 0-1
    }

    Entity Structure

    interface Entity {
      name: string
      fields: Record<string, Field>
      relationships?: EntityRelationship[]
      validation?: ValidationRule[]
      indexes?: Index[]
      
      // Tracking
      confidence: number
      source: {
        messageId: string
        reasoning: string
      }
    }
    
    interface Field {
      type: FieldType
      required: boolean
      unique?: boolean
      default?: any
      validation?: ValidationRule
      
      // For enums
      enumValues?: string[]
      
      // For relations
      references?: string
    }

    Extraction Strategies

    Progressive Enhancement

    Start simple, add complexity:

    // First pass: Basic structure
    {
      entities: {
        Post: { fields: { title: {...} } }
      }
    }
    
    // Second pass: Add relationships
    {
      relationships: [
        { from: 'Post', to: 'User', type: 'many-to-one' }
      ]
    }
    
    // Third pass: Add business logic
    {
      workflows: {
        publishPost: { ... }
      }
    }

    Context Accumulation

    Each message builds on previous understanding:

    class DSLExtractor {
      private context: DSLContext
      private messageHistory: Message[]
      
      async processMessage(message: string) {
        // Consider full conversation context
        const fullContext = this.buildContext()
        
        // Extract new information
        const extraction = await this.extract(message, fullContext)
        
        // Merge with existing DSL
        this.context = this.merge(this.context, extraction)
        
        // Update confidence based on consistency
        this.updateConfidence()
      }
    }

    Conflict Resolution

    When new information conflicts:

    interface ConflictResolution {
      strategy: 'ask' | 'merge' | 'override' | 'ignore'
      
      // Ask user for clarification
      ask: () => "Did you mean X or Y?"
      
      // Merge both interpretations
      merge: () => combineInterpretations()
      
      // Override with higher confidence
      override: () => useHigherConfidence()
      
      // Ignore if low confidence
      ignore: () => keepExisting()
    }

    Advanced Extraction Features

    Business Logic Detection

    // From: "Only admins can delete posts"
    {
      permissions: [
        {
          resource: 'Post',
          action: 'delete',
          condition: 'user.role === "admin"'
        }
      ]
    }

    Workflow Extraction

    // From: "When order is placed, send email and update inventory"
    {
      workflows: {
        placeOrder: {
          trigger: 'Order.created',
          steps: [
            { action: 'sendEmail', params: {...} },
            { action: 'updateInventory', params: {...} }
          ]
        }
      }
    }

    UI Inference

    // From: "Dashboard showing sales metrics"
    {
      pages: {
        dashboard: {
          type: 'dashboard',
          components: [
            { type: 'metric', data: 'sales' },
            { type: 'chart', data: 'revenue' }
          ]
        }
      }
    }

    Quality Metrics

    Extraction Confidence

    interface ConfidenceFactors {
      // How explicit was the mention?
      explicitness: 0.0 - 1.0
      
      // How consistent with context?
      consistency: 0.0 - 1.0
      
      // How complete is the information?
      completeness: 0.0 - 1.0
      
      // Overall confidence
      overall: weighted_average
    }

    Readiness Calculation

    function calculateReadiness(dsl: DSLContext): number {
      const factors = {
        // Do all entities have fields?
        entitiesComplete: checkEntityCompleteness(dsl),
        
        // Are relationships defined?
        relationshipsDefined: checkRelationships(dsl),
        
        // Is authentication configured?
        authConfigured: checkAuth(dsl),
        
        // Are there enough details?
        detailLevel: checkDetailLevel(dsl)
      }
      
      return weightedAverage(factors)
    }

    Best Practices for Better Extraction

    For Users

    #### Be Specific ✅ "Users have email, password, and profile photo" ❌ "Users have some fields"

    #### Describe Relationships Clearly ✅ "Each project has many tasks" ❌ "Projects and tasks are related"

    #### Include Business Rules ✅ "Only task assignee can mark as complete" ❌ "Add some permissions"

    For Developers

    #### Extraction Hints

    // Add hints to improve extraction
    const EXTRACTION_HINTS = {
      entities: {
        'User': ['email', 'password', 'role'],
        'Post': ['title', 'content', 'status']
      },
      relationships: {
        'ownership': 'one-to-many',
        'membership': 'many-to-many'
      }
    }

    #### Custom Extractors

    // Register domain-specific extractors
    registerExtractor('ecommerce', {
      entities: ['Product', 'Order', 'Customer'],
      patterns: [
        /shopping cart/i → 'Cart entity',
        /checkout/i → 'Order workflow'
      ]
    })

    Debugging Extraction

    View Extraction Logs

    // Enable debug mode
    DSL_DEBUG=true pnpm dev
    
    // Logs show:
    [EXTRACT] Detected entity: Post (confidence: 0.92)
    [EXTRACT] Inferred field: title (type: string)
    [EXTRACT] Found relationship: Post → User (many-to-one)
    [EXTRACT] Phase transition: discovering → clarifying

    Extraction Trace

    interface ExtractionTrace {
      input: string
      tokens: Token[]
      entities: EntityDetection[]
      relationships: RelationshipDetection[]
      fields: FieldInference[]
      conflicts: Conflict[]
      resolution: DSLDelta
    }

    Future Enhancements

    Planned Features

  • Visual DSL Editor: Drag-drop refinement
  • Extraction Templates: Pre-built patterns
  • Multi-language Support: Extract from any language
  • Code-to-DSL: Reverse engineer existing code
  • DSL Versioning: Track DSL evolution
  • Collaborative Extraction: Team-based refinement
  • Related Documentation

  • Conversation Phases - Understanding the journey
  • Entity Modeling - Best practices
  • Decision System - How AI makes choices
  • Code Generation - From DSL to code