Skip to main content

Confidence Extraction

The @prism-lang/confidence package provides comprehensive tools for extracting, calibrating, and managing confidence values from various sources including LLM responses, sensors, and APIs. This guide covers all aspects of confidence extraction in Prism.

Installation

npm install @prism-lang/confidence
# or
yarn add @prism-lang/confidence
# or
pnpm add @prism-lang/confidence

Quick Start

import { confidence, smartExtract } from '@prism-lang/confidence';

// Simple extraction from text
const result = await confidence.extract("I'm fairly certain Paris is the capital of France.");
console.log(result.value); // 0.75
console.log(result.explanation); // "Response analysis confidence: 75.0% based on hedging: 85%, certainty: 80%, specificity: 70%, completeness: 65%"

// Smart extraction (auto-detects best method)
const conf = await smartExtract("The answer is definitely 42.");
console.log(conf); // 0.9

Three Levels of API

The confidence extraction library provides three levels of API complexity:

Level 1: Dead Simple

import { ConfidenceExtractor } from '@prism-lang/confidence';

const extractor = new ConfidenceExtractor();
const result = await extractor.extract("I think the answer might be 42");
console.log(result.value); // 0.65

Level 2: Some Control

const result = await extractor.extractWithOptions(
"The answer is definitely 42",
{
method: 'response_analysis',
checkHedging: true,
checkCertainty: true
}
);

Level 3: Full Control

// Use specific extraction methods directly
const consistencyResult = await extractor.fromConsistency(
async () => llm.complete("What is 2+2?"),
{ samples: 5, aggregation: 'mean' }
);

const analysisResult = await extractor.fromResponseAnalysis(
"I'm absolutely certain the answer is 4",
{
checkHedging: true,
checkCertainty: true,
checkSpecificity: true,
checkCompleteness: true
}
);

const structuredResult = await extractor.fromStructuredResponse(
"The answer is 4 (confidence: 95%)"
);

Extraction Methods

1. Consistency-Based Extraction

Extract confidence by analyzing consistency across multiple samples:

const result = await extractor.fromConsistency(
async () => {
// Your sampling function
return await llm.complete("Explain quantum mechanics");
},
{
samples: 5, // Number of samples to collect
aggregation: 'mean', // How to aggregate: mean, median, mode, weighted
timeout: 30000 // Optional timeout
}
);

console.log(result.value); // 0.72
console.log(result.explanation); // "Moderate confidence (72.0%): 4/5 samples agreed. 2 unique variations found."

2. Response Analysis

Analyze linguistic features to determine confidence:

const result = await extractor.fromResponseAnalysis(
"I believe this might possibly be correct, though I'm not entirely sure.",
{
checkHedging: true, // Look for hedging phrases
checkCertainty: true, // Look for certainty markers
checkSpecificity: true, // Check for specific details
checkCompleteness: true, // Evaluate response completeness
customMarkers: {
low: ['possibly', 'might be', 'could be'],
high: ['definitely', 'certainly', 'absolutely']
}
}
);

Built-in Hedging Phrases (lower confidence):

  • might be, possibly, perhaps, could be, may be
  • it seems, appears to, suggests that, likely
  • probably, uncertain, not sure, hard to say

Built-in Certainty Phrases (higher confidence):

  • definitely, certainly, absolutely, clearly
  • obviously, without doubt, for sure, undoubtedly
  • conclusively, unquestionably

3. Structured Response Extraction

Extract confidence from structured formats in responses:

const result = await extractor.fromStructuredResponse(
"The capital of France is Paris (confidence: 98%)"
);
// Automatically detects and extracts: 0.98

// Supported patterns:
// - "confidence: 85%"
// - "confidence: 8.5/10"
// - "certainty: high/medium/low"
// - "(90% confident)"

Calibration

Domain-Specific Calibration

Adjust confidence based on domain expertise:

import { DomainCalibrator } from '@prism-lang/confidence';

const calibrator = new DomainCalibrator({
domain: 'medical',
curves: {
diagnosis: {
baseConfidence: 0.7,
adjustments: {
'has_lab_results': 0.15,
'has_imaging': 0.1,
'multiple_symptoms': -0.1
}
},
treatment: {
baseConfidence: 0.8,
adjustments: {
'fda_approved': 0.1,
'off_label': -0.2
}
}
}
});

const calibrated = await calibrator.calibrate(
{ value: 0.75 },
'diagnosis',
{ has_lab_results: true }
);
console.log(calibrated.value); // 0.9 (0.75 + 0.15)

Security Calibration

Adjust confidence for security-critical operations:

import { SecurityCalibrator } from '@prism-lang/confidence';

const secCalibrator = new SecurityCalibrator();

// Reduces confidence for high-risk operations
const result = await secCalibrator.calibrate(
{ value: 0.9 },
{ riskLevel: 'high' }
);
console.log(result.value); // 0.72 (reduced by 20%)

// Risk levels: low (no change), medium (-10%), high (-20%), critical (-30%)

Interactive Calibration

Learn from user feedback:

import { InteractiveCalibrator } from '@prism-lang/confidence';

const interactiveCalibrator = new InteractiveCalibrator();

// Record feedback
interactiveCalibrator.recordFeedback(
{ value: 0.8, provenance: { sources: [{ method: 'linguistic' }] } },
true // Was the prediction correct?
);

// Use learned calibration
const calibrated = await interactiveCalibrator.calibrate({ value: 0.8 });

Ensemble Methods

Combine multiple confidence sources:

import { ConfidenceEnsemble } from '@prism-lang/confidence';

const ensemble = new ConfidenceEnsemble({
consistency: 0.4,
linguistic: 0.3,
structured: 0.3
});

const combined = await ensemble.combine([
{ value: 0.8, provenance: { sources: [{ method: 'consistency' }] } },
{ value: 0.7, provenance: { sources: [{ method: 'linguistic' }] } },
{ value: 0.9, provenance: { sources: [{ method: 'structured' }] } }
]);

console.log(combined.value); // 0.8 (weighted average)

Advanced Patterns

Confidence Budget Management

Ensure minimum confidence across a set of operations:

import { ConfidenceBudgetManager } from '@prism-lang/confidence';

const budgetManager = new ConfidenceBudgetManager(0.7); // Minimum 70% total

budgetManager.add('step1', 0.9);
budgetManager.add('step2', 0.8);
budgetManager.add('step3', 0.85);

console.log(budgetManager.isWithinBudget()); // true
console.log(budgetManager.getTotalConfidence()); // 0.612 (0.9 * 0.8 * 0.85)
console.log(budgetManager.getWeakestLink()); // { value: 'step2', confidence: 0.8 }

Confidence Contracts

Define and validate confidence requirements:

import { ConfidenceContractManager } from '@prism-lang/confidence';

const contract = new ConfidenceContractManager({
'data_quality': 0.8,
'model_accuracy': 0.75,
'input_validation': 0.9
});

const validation = contract.validate({
'data_quality': 0.85,
'model_accuracy': 0.7, // Below threshold!
'input_validation': 0.95
});

console.log(validation.isValid); // false
console.log(validation.failures); // ['model_accuracy: 0.7 < 0.75']

Differential Confidence

Track confidence across multiple aspects:

import { DifferentialConfidenceManager } from '@prism-lang/confidence';

const diffManager = new DifferentialConfidenceManager();

const result = diffManager.calculate({
'accuracy': 0.9,
'completeness': 0.7,
'timeliness': 0.8
});

console.log(result.average); // 0.8
console.log(result.variance); // 0.007
console.log(result.min); // { aspect: 'completeness', value: 0.7 }
console.log(result.recommendation); // "Focus on improving completeness (current: 0.7)"

Temporal Confidence

Model confidence decay over time:

import { TemporalConfidence } from '@prism-lang/confidence';

const temporal = new TemporalConfidence(
24, // 24 hour half-life
'exponential'
);

const aged = temporal.apply(
{ value: 0.9 },
12 // 12 hours old
);

console.log(aged.value); // 0.636 (0.9 * 0.707)

Source-Specific Extractors

Sensor Confidence

import { SensorConfidenceExtractor } from '@prism-lang/confidence';

const sensorExtractor = new SensorConfidenceExtractor();

const confidence = await sensorExtractor.extract({
age: 5, // 5 minutes old
environment: { temperature: 25, humidity: 0.6 },
history: 100, // 100 previous readings
calibrationDate: new Date('2024-01-01')
});

API Confidence

import { APIConfidenceExtractor } from '@prism-lang/confidence';

const apiExtractor = new APIConfidenceExtractor();

const confidence = await apiExtractor.extract({
provider: 'weather-api',
historicalAccuracy: 0.92,
latency: 150, // ms
lastFailure: new Date('2024-01-15')
});

Real-World Examples

Example 1: LLM Response with Confidence

import { ConfidenceExtractor } from '@prism-lang/confidence';
import { ClaudeProvider, LLMRequest } from '@prism-lang/llm';

const llm = new ClaudeProvider(process.env.CLAUDE_API_KEY);
const extractor = new ConfidenceExtractor();

async function queryWithConfidence(prompt: string) {
// Get LLM response
const response = await llm.complete(new LLMRequest(prompt));

// Extract confidence from response
const confidence = await extractor.fromResponseAnalysis(response.content);

return {
answer: response.content,
confidence: confidence.value,
explanation: confidence.explanation
};
}

const result = await queryWithConfidence("What causes rain?");
console.log(`Answer: ${result.answer}`);
console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`);

Example 2: Multi-Source Confidence

import { ConfidenceEnsemble, ConfidenceExtractor } from '@prism-lang/confidence';

async function robustQuery(prompt: string) {
const extractor = new ConfidenceExtractor();

// Method 1: Consistency across multiple runs
const consistencyConf = await extractor.fromConsistency(
async () => llm.complete(prompt),
{ samples: 3 }
);

// Method 2: Single response analysis
const singleResponse = await llm.complete(prompt);
const analysisConf = await extractor.fromResponseAnalysis(singleResponse.content);

// Method 3: Check for structured confidence
const structuredConf = await extractor.fromStructuredResponse(singleResponse.content);

// Combine using ensemble
const ensemble = new ConfidenceEnsemble({
consistency: 0.5,
linguistic: 0.3,
structured: 0.2
});

const combined = await ensemble.combine([
consistencyConf,
analysisConf,
structuredConf
]);

return {
answer: singleResponse.content,
confidence: combined.value,
sources: combined.provenance.sources
};
}

Example 3: Production Pipeline with Calibration

import { 
ConfidenceExtractor,
DomainCalibrator,
ConfidenceBudgetManager,
SecurityCalibrator
} from '@prism-lang/confidence';

class ProductionPipeline {
private extractor = new ConfidenceExtractor();
private domainCalibrator = new DomainCalibrator({ domain: 'finance' });
private securityCalibrator = new SecurityCalibrator();
private budgetManager = new ConfidenceBudgetManager(0.8);

async processQuery(query: string, context: any) {
// Step 1: Get initial response and confidence
const response = await llm.complete(query);
const initialConf = await this.extractor.extract(response.content);

// Step 2: Domain calibration
const domainCalibrated = await this.domainCalibrator.calibrate(
initialConf,
'trading',
context
);

// Step 3: Security calibration if needed
const finalConf = context.sensitive
? await this.securityCalibrator.calibrate(domainCalibrated, { riskLevel: 'high' })
: domainCalibrated;

// Step 4: Add to budget
this.budgetManager.add(query, finalConf.value);

// Step 5: Check if we're still confident enough
if (!this.budgetManager.isWithinBudget()) {
throw new Error('Confidence budget exceeded - manual review required');
}

return {
response: response.content,
confidence: finalConf.value,
requiresReview: finalConf.value < 0.7
};
}
}

Best Practices

1. Choose the Right Method

// For consistent operations, use consistency-based
const conf1 = await extractor.fromConsistency(sampler, { samples: 5 });

// For one-off analysis, use response analysis
const conf2 = await extractor.fromResponseAnalysis(text);

// For structured responses, use structured extraction
const conf3 = await extractor.fromStructuredResponse(text);

2. Always Calibrate for Production

// Domain-specific calibration
const calibrated = await domainCalibrator.calibrate(raw, category, context);

// Security calibration for sensitive operations
const secured = await securityCalibrator.calibrate(calibrated, { riskLevel });

3. Use Ensemble for Critical Decisions

const ensemble = new ConfidenceEnsemble({
primary: 0.6,
secondary: 0.3,
tertiary: 0.1
});

const robust = await ensemble.combine(multipleResults);

4. Track Confidence Over Time

// For time-sensitive data
const temporal = new TemporalConfidence(48, 'exponential');
const current = temporal.apply(original, hoursElapsed);

// For learning systems
const calibrator = new InteractiveCalibrator();
calibrator.recordFeedback(result, wasCorrect);

5. Set Appropriate Thresholds

// Define confidence requirements
const contract = new ConfidenceContractManager({
'critical_operation': 0.95,
'standard_operation': 0.8,
'experimental_feature': 0.6
});

// Validate before proceeding
const validation = contract.validate(actualConfidences);
if (!validation.isValid) {
console.error('Confidence requirements not met:', validation.failures);
}

Troubleshooting

Low Confidence Scores

// Debug low confidence
const result = await extractor.fromResponseAnalysis(text, {
checkHedging: true,
checkCertainty: true,
checkSpecificity: true,
checkCompleteness: true
});

console.log('Confidence breakdown:', result.provenance);
// Shows which factors contributed to low confidence

Inconsistent Results

// Use consistency check
const consistency = await extractor.fromConsistency(
sampler,
{ samples: 10, aggregation: 'median' }
);

if (consistency.value < 0.6) {
console.warn('High variance in results - consider increasing samples');
}

Calibration Issues

// Validate calibration curves
const testCases = [
{ input: 0.5, expected: 0.6 },
{ input: 0.8, expected: 0.85 }
];

for (const test of testCases) {
const result = await calibrator.calibrate({ value: test.input });
console.log(`Input: ${test.input}, Output: ${result.value}, Expected: ${test.expected}`);
}