AI Inference Guide
🧠
Core Concepts
4
🚀
Deployment Options
3
🛠️
Tools & Services
2
⚡
Advanced Topics
2
Code Examples
Getting Started Examples
Practical code examples to help you get started with different inference approaches. Copy and modify these examples for your own projects.
WebLLM Browser Example
import { CreateMLCEngine } from "@mlc-ai/web-llm";
// Initialize the engine
const engine = await CreateMLCEngine(
  "Llama-3.2-1B-Instruct-q4f32_1-MLC",
  { 
    initProgressCallback: (progress) => {
      console.log('Loading:', progress.progress + '%');
    }
  }
);
// Generate text
const response = await engine.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello! How are you?" }
  ],
  temperature: 0.8,
  max_tokens: 100
});
console.log(response.choices[0].message.content);BrowserAI Example
import { BrowserAI } from '@browserai/browserai';
const browserAI = new BrowserAI();
// Load model with progress tracking
await browserAI.loadModel('llama-3.2-1b-instruct', {
  quantization: 'q4f16_1',
  onProgress: (progress) => console.log('Loading:', progress.progress + '%')
});
// Generate text
const response = await browserAI.generateText('Hello, how are you?');
console.log(response.choices[0].message.content);
// Streaming example
const chunks = await browserAI.generateText('Write a story', {
  stream: true,
  temperature: 0.8
});
for await (const chunk of chunks) {
  console.log(chunk.choices[0]?.delta.content || '');
}Ollama Local Server
# Install and run Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
# Pull and run a model
ollama pull llama3.2:1b
ollama run llama3.2:1b "Hello, world!"
# Use the REST API
fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'llama3.2:1b',
    prompt: 'Hello!',
    stream: false
  })
}).then(r => r.json()).then(console.log);LM Studio Desktop App
// LM Studio provides a local server compatible with OpenAI API
const response = await fetch('http://localhost:1234/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer lm-studio'
  },
  body: JSON.stringify({
    model: 'llama-3.2-1b-instruct',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Hello!' }
    ],
    temperature: 0.7,
    max_tokens: 100
  })
});
const data = await response.json();
console.log(data.choices[0].message.content);Provider API Example (Together AI)
import Together from "together-ai";
const together = new Together({
  apiKey: process.env.TOGETHER_API_KEY,
});
const response = await together.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in simple terms." }
  ],
  model: "meta-llama/Llama-3.2-3B-Instruct-Turbo",
  max_tokens: 500,
  temperature: 0.7,
  stream: true,
});
// Handle streaming response
for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}Vision Model Example
import { CreateMLCEngine } from "@mlc-ai/web-llm";
// Load a vision-language model
const engine = await CreateMLCEngine("Llava-1.5-7B-q4f16_1-MLC");
// Process image and text
const response = await engine.chat.completions.create({
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What do you see in this image?" },
        {
          type: "image_url",
          image_url: { url: "data:image/jpeg;base64,..." }
        }
      ]
    }
  ]
});
console.log(response.choices[0].message.content);Best Practices
Performance Optimization
- • Use quantized models for faster inference
- • Implement proper caching strategies
- • Optimize batch sizes for throughput
- • Monitor memory usage and cleanup
User Experience
- • Show loading progress for model downloads
- • Implement streaming for long responses
- • Provide fallback options
- • Handle errors gracefully
Code Examples
Getting Started Examples
Practical code examples to help you get started with different inference approaches. Copy and modify these examples for your own projects.
WebLLM Browser Example
import { CreateMLCEngine } from "@mlc-ai/web-llm";
// Initialize the engine
const engine = await CreateMLCEngine(
  "Llama-3.2-1B-Instruct-q4f32_1-MLC",
  { 
    initProgressCallback: (progress) => {
      console.log('Loading:', progress.progress + '%');
    }
  }
);
// Generate text
const response = await engine.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello! How are you?" }
  ],
  temperature: 0.8,
  max_tokens: 100
});
console.log(response.choices[0].message.content);BrowserAI Example
import { BrowserAI } from '@browserai/browserai';
const browserAI = new BrowserAI();
// Load model with progress tracking
await browserAI.loadModel('llama-3.2-1b-instruct', {
  quantization: 'q4f16_1',
  onProgress: (progress) => console.log('Loading:', progress.progress + '%')
});
// Generate text
const response = await browserAI.generateText('Hello, how are you?');
console.log(response.choices[0].message.content);
// Streaming example
const chunks = await browserAI.generateText('Write a story', {
  stream: true,
  temperature: 0.8
});
for await (const chunk of chunks) {
  console.log(chunk.choices[0]?.delta.content || '');
}Ollama Local Server
# Install and run Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
# Pull and run a model
ollama pull llama3.2:1b
ollama run llama3.2:1b "Hello, world!"
# Use the REST API
fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'llama3.2:1b',
    prompt: 'Hello!',
    stream: false
  })
}).then(r => r.json()).then(console.log);LM Studio Desktop App
// LM Studio provides a local server compatible with OpenAI API
const response = await fetch('http://localhost:1234/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer lm-studio'
  },
  body: JSON.stringify({
    model: 'llama-3.2-1b-instruct',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Hello!' }
    ],
    temperature: 0.7,
    max_tokens: 100
  })
});
const data = await response.json();
console.log(data.choices[0].message.content);Provider API Example (Together AI)
import Together from "together-ai";
const together = new Together({
  apiKey: process.env.TOGETHER_API_KEY,
});
const response = await together.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in simple terms." }
  ],
  model: "meta-llama/Llama-3.2-3B-Instruct-Turbo",
  max_tokens: 500,
  temperature: 0.7,
  stream: true,
});
// Handle streaming response
for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}Vision Model Example
import { CreateMLCEngine } from "@mlc-ai/web-llm";
// Load a vision-language model
const engine = await CreateMLCEngine("Llava-1.5-7B-q4f16_1-MLC");
// Process image and text
const response = await engine.chat.completions.create({
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What do you see in this image?" },
        {
          type: "image_url",
          image_url: { url: "data:image/jpeg;base64,..." }
        }
      ]
    }
  ]
});
console.log(response.choices[0].message.content);Best Practices
Performance Optimization
- • Use quantized models for faster inference
- • Implement proper caching strategies
- • Optimize batch sizes for throughput
- • Monitor memory usage and cleanup
User Experience
- • Show loading progress for model downloads
- • Implement streaming for long responses
- • Provide fallback options
- • Handle errors gracefully
AI Inference Guide
closedAI Inference Guide
🧠
Core Concepts
4
🚀
Deployment Options
3
🛠️
Tools & Services
2
⚡
Advanced Topics
2