Building a Chrome extension that leverages AI technologies can significantly enhance user experience by adding powerful features directly into the browser.
In this tutorial, we’ll cover the entire process of building a Chrome extension from scratch with AI/ML API, Deepgram Aura, and IndexDB, from setup to deployment. We’ll start by setting up our development environment, including installing necessary tools and configuring our project. Then, we’ll dive into the core components of our Chrome extension: manifest.json contains basic metadata about your extension, scripts.js responsible how our extension will behave, and styles.css to add some styling. We’ll explore how integrate these technologies with Deepgram Aura through AI/ML API, and use IndexDB as temporary storage for generated audio file. Along the way, we’ll discuss best practices for building Chrome extension, handling user queries, and saving data in the database. By the end of this tutorial, you’ll have a solid foundation in building Chrome extension and be well-equipped to build any AI-powered Chrome extension.
Let’s get a brief overview of technologies we are going to utilize.
AI/ML API is a game-changing platform for developers and SaaS entrepreneurs looking to integrate cutting-edge AI capabilities into their products. AI/ML API offers a single point of access to over 200 state-of-the-art AI models, covering everything from NLP to computer vision.
It’s Absolutely FREE to get started! Try It Now click
Deep Dive into AI/ML API Documentation; https://docs.aimlapi.com/
Chrome extension is a small software program that modifies or enhances the functionality of the Google Chrome web browser. These extensions are built using web technologies such as HTML, CSS, and JavaScript, and are designed to serve a single purpose, making them easy to understand and use.
Browse Chrome Web Store; https://chromewebstore.google.com/
Deepgram Aura is the first text-to-speech (TTS) AI model designed for real-time, conversational AI agents and applications. It delivers human-like voice quality with unparalleled speed and efficiency, making it a game-changer for building responsive, high-throughput voice AI experiences.
Learn more about technical details; https://aimlapi.com/models/aura
IndexedDB is a low-level API for client-side storage of significant amounts of structured data, including files/blobs. IndexedDB is a JavaScript-based object-oriented database.
Learn more about key concepts and usage; https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API
Building a Chrome extension involves understanding its structure, permissions, and how it interacts with web pages. We’ll start by setting up our development environment and creating the foundational files required for our extension.
Before we begin coding, ensure you have the following:
A minimal Chrome extension requires at least three files:
Let’s create a directory for our project and set up these files.
Open your terminal and run the following commands to create a new folder for your extension:
mkdir my-first-chrome-extension
cd my-first-chrome-extension
Within the new directory, create the necessary files:
touch manifest.json
touch scripts.js
touch styles.css
The manifest.json filei s the heart of your Chrome extension. It tells the browser about your extension, what it does, and what permissions it needs. Let's delve into configuring this file properly.
{
"manifest_version": 3,
"name": "Read Aloud",
"version": "1.0",
"description": "Read Aloud anything in any tab",
"host_permissions": [
"*://*.aimlapi.com/*"
],
"permissions": [
"activeTab"
],
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": ["scripts.js"],
"css": ["styles.css"]
}
],
"icons": {
"16": "icons/icon.png",
"48": "icons/icon.png",
"128": "icons/icon.png"
}
}
At a minimum, manifest.json must include:
Beyond the essential fields, we’ll add:
Open your browser and go to chatgpt.com. Now let’s generate icon for our Chrome extension. We’ll use one icon for different sizes (it’s totally ok).
Enter the following prompt:
Generate black and white icon for my “Read Aloud” Chrome extension.
This extension allows users to highlight the specific text
in the website and listen to it. It’s AI-powered Chrome extension.
The background should be in white and solid.
Wait a couple of seconds until ChatGPT generates the icon (image). Click download and rename it to icon.png. Then put inside icons folder.
With all fields properly defined, your manifest.json will enable browser to understand and correctly load your extension.
The scripts.js file contains the logic that controls how your extension behaves. We'll outline the key functionalities your script needs to implement.
Start by setting up necessary variables:
// Set your AIML_API_KEY key
const AIML_API_KEY = ''; // Replace with your AIML_API_KEY key
// Create the overlay
const overlay = document.createElement('div');
overlay.id = 'read-aloud-overlay';
// Create the "Read Aloud" button
const askButton = document.createElement('button');
askButton.id = 'read-aloud-button';
askButton.innerText = 'Read Aloud';
// Append the button to the overlay
overlay.appendChild(askButton);
// Variables to store selected text and range
let selectedText = '';
let selectedRange = null;
Your extension should detect when a user selects text on a webpage:
document.addEventListener('mouseup', (event) => {
console.log('mouseup event: ', event);
//...code
}
const selection = window.getSelection();
const text = selection.toString().trim();
if (text !== '') {
const range = selection.getRangeAt(0);
const rect = range.getBoundingClientRect();
// Set the position of the overlay
overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed
overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay
selectedText = text;
selectedRange = range;
// Remove existing overlay if any
const existingOverlay = document.getElementById('read-aloud-overlay');
if (existingOverlay) {
existingOverlay.remove();
}
// Append the overlay to the document body
document.body.appendChild(overlay);
} else {
// Remove overlay if no text is selected
const existingOverlay = document.getElementById('read-aloud-overlay');
if (existingOverlay) {
existingOverlay.remove();
}
}
// Function to handle text selection
document.addEventListener('mouseup', (event) => {
console.log('mouseup event: ', event);
const selection = window.getSelection();
const text = selection.toString().trim();
if (text !== '') {
const range = selection.getRangeAt(0);
const rect = range.getBoundingClientRect();
// Set the position of the overlay
overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed
overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay
selectedText = text;
selectedRange = range;
// Remove existing overlay if any
const existingOverlay = document.getElementById('read-aloud-overlay');
if (existingOverlay) {
existingOverlay.remove();
}
// Append the overlay to the document body
document.body.appendChild(overlay);
} else {
// Remove overlay if no text is selected
const existingOverlay = document.getElementById('read-aloud-overlay');
if (existingOverlay) {
existingOverlay.remove();
}
}
});
When the user clicks the “Read Aloud” button:
if (selectedText.length > 200) {
// ...code
}
// Disable the button
askButton.disabled = true;
askButton.innerText = 'Loading...';
// Send the selected text to your AI/ML API for TTS
const response = await fetch('https://api.aimlapi.com/tts', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${AIML_API_KEY}`, // Replace with your actual API key
},
body: JSON.stringify({
model: '#g1_aura-asteria-en', // Replace with your specific model if needed
text: selectedText
})
});
try {
// ...code
if (!response.ok) {
throw new Error('API request failed');
}
// ...code
} catch (error) {
console.error('Error:', error);
askButton.disabled = false;
askButton.innerText = 'Read Aloud';
alert('An error occurred while fetching the audio.');
}
// Play the audio
audio.play();
To manage audio files efficiently:
// Open IndexedDB
const db = await openDatabase();
const audioId = 'audio_' + Date.now(); // Generate a unique ID for the audio
// Save audio blob to IndexedDB
await saveAudioToIndexedDB(db, audioId, audioBlob);
// Retrieve audio blob from IndexedDB
const retrievedAudioBlob = await getAudioFromIndexedDB(db, audioId);
// Create an object URL for the audio and play it
const audioURL = URL.createObjectURL(retrievedAudioBlob);
const audio = new Audio(audioURL);
// Play the audio
audio.play();
// After the audio has finished playing, delete it from IndexedDB
audio.addEventListener('ended', async () => {
// Revoke the object URL
URL.revokeObjectURL(audioURL);
// Delete the audio from IndexedDB
await deleteAudioFromIndexedDB(db, audioId);
console.log('Audio deleted from IndexedDB after playback.');
});
// Remove overlay when clicking elsewhere
document.addEventListener('mousedown', (event) => {
const overlayElement = document.getElementById('read-aloud-overlay');
if (overlayElement && !overlayElement.contains(event.target)) {
overlayElement.remove();
window.getSelection().removeAllRanges();
}
});
// Delay function
const delay = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
// Handle click on "Read Aloud" button using event delegation
document.body.addEventListener('click', async (event) => {
if (selectedText.length > 200) {
console.log('selectedText: ', selectedText);
event.stopPropagation();
// Disable the button
askButton.disabled = true;
askButton.innerText = 'Loading...';
try {
// Delay before sending the request (if needed)
await delay(3000);
// Send the selected text to your AI/ML API for TTS
const response = await fetch('https://api.aimlapi.com/tts', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${AIML_API_KEY}`, // Replace with your actual API key
},
body: JSON.stringify({
model: '#g1_aura-asteria-en', // Replace with your specific model if needed
text: selectedText
})
});
if (!response.ok) {
throw new Error('API request failed');
}
// Get the audio data as a blob
const audioBlob = await response.blob();
console.log('Audio blob:', audioBlob);
// Open IndexedDB
const db = await openDatabase();
const audioId = 'audio_' + Date.now(); // Generate a unique ID for the audio
// Save audio blob to IndexedDB
await saveAudioToIndexedDB(db, audioId, audioBlob);
// Retrieve audio blob from IndexedDB
const retrievedAudioBlob = await getAudioFromIndexedDB(db, audioId);
// Create an object URL for the audio and play it
const audioURL = URL.createObjectURL(retrievedAudioBlob);
const audio = new Audio(audioURL);
// Play the audio
audio.play();
// After the audio has finished playing, delete it from IndexedDB
audio.addEventListener('ended', async () => {
// Revoke the object URL
URL.revokeObjectURL(audioURL);
// Delete the audio from IndexedDB
await deleteAudioFromIndexedDB(db, audioId);
console.log('Audio deleted from IndexedDB after playback.');
});
// Re-enable the button
askButton.disabled = false;
askButton.innerText = 'Read Aloud';
} catch (error) {
console.error('Error:', error);
askButton.disabled = false;
askButton.innerText = 'Read Aloud';
alert('An error occurred while fetching the audio.');
}
}
});
IndexedDB is a powerful client-side storage system that allows us to store large amounts of data, including files and blobs.
You’ll need to create four primary functions to interact with IndexedDB:
// Function to open IndexedDB
function openDatabase() {
return new Promise((resolve, reject) => {
const request = indexedDB.open('audioDatabase', 1);
request.onupgradeneeded = (event) => {
const db = event.target.result;
db.createObjectStore('audios', { keyPath: 'id' });
};
request.onsuccess = (event) => {
resolve(event.target.result);
};
request.onerror = (event) => {
reject(event.target.error);
};
});
}
// Function to save audio blob to IndexedDB
function saveAudioToIndexedDB(db, id, blob) {
return new Promise((resolve, reject) => {
const transaction = db.transaction(['audios'], 'readwrite');
const store = transaction.objectStore('audios');
const request = store.put({ id: id, audio: blob });
request.onsuccess = () => {
resolve();
};
request.onerror = (event) => {
reject(event.target.error);
};
});
}
// Function to get audio blob from IndexedDB
function getAudioFromIndexedDB(db, id) {
return new Promise((resolve, reject) => {
const transaction = db.transaction(['audios'], 'readonly');
const store = transaction.objectStore('audios');
const request = store.get(id);
request.onsuccess = (event) => {
if (request.result) {
resolve(request.result.audio);
} else {
reject('Audio not found in IndexedDB');
}
};
request.onerror = (event) => {
reject(event.target.error);
};
});
}
// Function to delete audio from IndexedDB
function deleteAudioFromIndexedDB(db, id) {
return new Promise((resolve, reject) => {
const transaction = db.transaction(['audios'], 'readwrite');
const store = transaction.objectStore('audios');
const request = store.delete(id);
request.onsuccess = () => {
resolve();
};
request.onerror = (event) => {
reject(event.target.error);
};
});
}
To provide a seamless user experience, your extension should have a clean and intuitive interface.
Define styles for:
#read-aloud-overlay {
cursor: pointer;
position: absolute;
width: 140px;
height: 40px;
border-radius: 4px;
background-color: #333;
display: flex;
justify-content: center;
align-items: center;
padding: 0 5px;
box-sizing: border-box;
}
#read-aloud-button {
color: #fff;
background: transparent;
border: none;
font-size: 14px;
cursor: pointer;
}
#read-aloud-button:hover {
color: #000;
padding: 5px 10px;
border-radius: 4px;
}
#read-aloud-button:disabled {
color: #aaa;
cursor: default;
}
To interact with the AI/ML API and Deepgram Aura model, you’ll need an API key.
touch .env
Now put your API Key:
AIML_API_KEY=put_your_api_key_here
But it won’t work instantly. Using .env in Chrome extensions requires other extra configurations. We’ll talk about this in upcoming tutorials.
// Set your AIML_API_KEY key
const AIML_API_KEY = ''; // Replace with your AIML_API_KEY key
It’s Absolutely FREE to get started! Try It Now click
With all components in place, it’s time to load your extension into Chrome browser and see it in action.
In this tutorial, we’ve:
With a solid foundation, you can enhance your extension further:
Congratulations on building a Chrome extension that integrates advanced AI capabilities! This project showcases how combining web technologies with powerful APIs can create engaging and accessible user experiences. You’re now equipped with the knowledge to develop and expand upon this extension or create entirely new ones that leverage AI/ML APIs.
Full implementation available on Github; https://github.com/TechWithAbee/Building-a-Chrome-Extension-from-Scratch-with-AI-ML-API-Deepgram-Aura-and-IndexDB-Integration
It’s Absolutely FREE to get started! Try It Now click
Should you have any questions or need further assistance, don’t hesitate to reach out via email at abdibrokhim@gmail.com.