Working with Text: NLP Tasks

3. Working with Text: NLP Tasks

Natural Language Processing (NLP) is a cornerstone of modern AI, allowing computers to understand, interpret, and generate human language. Transformers.js makes many powerful NLP tasks readily available in the browser. In this chapter, we’ll explore some of the most common and impactful NLP tasks.

3.1. Sentiment Analysis (Text Classification)

Sentiment analysis, a form of text classification, involves determining the emotional tone behind a piece of text—whether it’s positive, negative, or neutral. This is incredibly useful for analyzing customer reviews, social media feeds, or survey responses.

3.1.1. Detailed Explanation

A sentiment analysis pipeline takes a string of text as input and outputs a label (e.g., “POSITIVE”, “NEGATIVE”, “NEUTRAL”) along with a confidence score. Under the hood, the model has been trained on a massive dataset of texts labeled with their sentiment.

3.1.2. Code Examples: Movie Review Sentiment

Let’s build on our previous example and create a more robust sentiment analyzer.

<!-- index.html -->
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Sentiment Analyzer</title>
    <style>
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; margin: 20px; background-color: #f4f7f6; color: #333; display: flex; flex-direction: column; align-items: center; }
        .container { background-color: #ffffff; padding: 30px; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); width: 100%; max-width: 600px; text-align: center; }
        h1 { color: #2c3e50; margin-bottom: 20px; }
        textarea { width: calc(100% - 20px); height: 120px; margin-bottom: 15px; padding: 10px; border: 1px solid #ddd; border-radius: 5px; font-size: 16px; resize: vertical; }
        button { background-color: #28a745; color: white; padding: 12px 25px; border: none; border-radius: 5px; cursor: pointer; font-size: 17px; transition: background-color 0.3s ease; }
        button:hover:not(:disabled) { background-color: #218838; }
        button:disabled { background-color: #cccccc; cursor: not-allowed; }
        #output { margin-top: 25px; padding: 20px; border: 1px solid #e0e0e0; border-radius: 8px; background-color: #f0f8ff; text-align: left; }
        #loadingSpinner { border: 4px solid #f3f3f3; border-top: 4px solid #3498db; border-radius: 50%; width: 20px; height: 20px; animation: spin 1s linear infinite; display: inline-block; margin-right: 10px; vertical-align: middle; display: none; }
        @keyframes spin { 0% { transform: rotate(0deg); } 100% { transform: rotate(360deg); } }
        .positive { color: #28a745; font-weight: bold; }
        .negative { color: #dc3545; font-weight: bold; }
        .neutral { color: #ffc107; font-weight: bold; }
    </style>
</head>
<body>
    <div class="container">
        <h1>Movie Review Sentiment Analyzer</h1>
        <textarea id="reviewText" placeholder="Paste your movie review here..."></textarea>
        <button id="analyzeButton">
            <span id="loadingSpinner"></span> Analyze Sentiment
        </button>
        <div id="output">
            <h3>Analysis Result:</h3>
            <p id="sentimentResult">Enter a review and click "Analyze Sentiment".</p>
        </div>
    </div>

    <script type="module" src="./app.js"></script>
</body>
</html>
// app.js
import { pipeline } from "https://esm.sh/@huggingface/transformers";

document.addEventListener('DOMContentLoaded', async () => {
    const reviewText = document.getElementById('reviewText');
    const analyzeButton = document.getElementById('analyzeButton');
    const sentimentResult = document.getElementById('sentimentResult');
    const loadingSpinner = document.getElementById('loadingSpinner');

    // Display loading message and disable button while model loads
    sentimentResult.textContent = "Loading sentiment analysis model... this may take a moment.";
    analyzeButton.disabled = true;
    loadingSpinner.style.display = 'inline-block';

    let sentimentClassifier;
    try {
        // Using a pre-trained model for sentiment analysis
        sentimentClassifier = await pipeline(
            'sentiment-analysis',
            'Xenova/distilbert-base-uncased-finetuned-sst-2-english',
            {
                // Prioritize WebGPU for performance if available
                device: 'webgpu',
                // Fallback to q8 for WASM to save memory if WebGPU isn't used
                dtype: 'q8',
            }
        );
        sentimentResult.textContent = "Model loaded. Ready for analysis!";
        analyzeButton.disabled = false;
        loadingSpinner.style.display = 'none';
    } catch (error) {
        console.error("Failed to load sentiment analysis model:", error);
        sentimentResult.textContent = "Error loading model. Check console.";
        loadingSpinner.style.display = 'none';
    }

    analyzeButton.addEventListener('click', async () => {
        const text = reviewText.value.trim();
        if (text === "") {
            sentimentResult.textContent = "Please enter some text to analyze.";
            return;
        }

        analyzeButton.disabled = true;
        loadingSpinner.style.display = 'inline-block';
        sentimentResult.textContent = "Analyzing...";
        sentimentResult.className = ''; // Clear previous styling

        try {
            const output = await sentimentClassifier(text);
            const label = output[0].label;
            const score = (output[0].score * 100).toFixed(2);

            let sentimentClass = 'neutral';
            if (label === 'POSITIVE') {
                sentimentClass = 'positive';
            } else if (label === 'NEGATIVE') {
                sentimentClass = 'negative';
            }

            sentimentResult.innerHTML = `Sentiment: <span class="${sentimentClass}">${label}</span> (Confidence: ${score}%)`;

        } catch (error) {
            console.error("Error during sentiment analysis:", error);
            sentimentResult.textContent = "Error during analysis. Please try again.";
            sentimentResult.className = 'negative';
        } finally {
            analyzeButton.disabled = false;
            loadingSpinner.style.display = 'none';
        }
    });
});

3.1.3. Exercises/Mini-Challenges: Enhanced Sentiment

  1. Multiple Sentences: Modify the app.js to accept multiple sentences (e.g., one sentence per line in the textarea). The model can often handle arrays of inputs. If the model is a text-classification model, it will give an array of results. Process each result and display the sentiment for each sentence individually.
  2. Thresholding: Implement a “neutral” threshold. If a sentiment score (e.g., for POSITIVE or NEGATIVE) is below a certain percentage (e.g., 60%), classify it as “NEUTRAL” even if the model outputs “POSITIVE” or “NEGATIVE”. Update the sentimentResult styling accordingly.
  3. Model Comparison: Replace 'Xenova/distilbert-base-uncased-finetuned-sst-2-english' with another English sentiment analysis model from the Hugging Face Hub (e.g., Xenova/bert-base-uncased-finetuned-sst-2-english or a smaller one if you search sentiment and transformers.js). Compare the results and performance.

3.2. Text Generation

Text generation is the process of creating new human-like text based on a given prompt. This can be used for creative writing, code completion, dialogue generation, and more.

3.2.1. Detailed Explanation

A text generation pipeline typically takes a starting prompt as input. The model then predicts the most probable next word(s) or token(s) repeatedly until a specified length is reached or an end-of-sequence token is generated.

3.2.2. Code Examples: Creative Writing Assistant

Let’s create a simple tool to generate creative text.

<!-- index.html -->
<!-- ... (re-use similar HTML structure, adjust title and ids) ... -->
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Text Generator</title>
    <style>
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; margin: 20px; background-color: #f4f7f6; color: #333; display: flex; flex-direction: column; align-items: center; }
        .container { background-color: #ffffff; padding: 30px; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); width: 100%; max-width: 600px; text-align: center; }
        h1 { color: #2c3e50; margin-bottom: 20px; }
        textarea { width: calc(100% - 20px); height: 100px; margin-bottom: 15px; padding: 10px; border: 1px solid #ddd; border-radius: 5px; font-size: 16px; resize: vertical; }
        label { display: block; margin-bottom: 5px; font-weight: bold; text-align: left; width: 100%; max-width: 580px; margin-left: 10px; }
        input[type="number"] { width: 80px; padding: 8px; border: 1px solid #ddd; border-radius: 4px; margin-bottom: 15px; }
        button { background-color: #007bff; color: white; padding: 12px 25px; border: none; border-radius: 5px; cursor: pointer; font-size: 17px; transition: background-color 0.3s ease; margin-bottom: 15px; }
        button:hover:not(:disabled) { background-color: #0056b3; }
        button:disabled { background-color: #cccccc; cursor: not-allowed; }
        #output { margin-top: 25px; padding: 20px; border: 1px solid #e0e0e0; border-radius: 8px; background-color: #fffaf0; text-align: left; white-space: pre-wrap; word-wrap: break-word; }
        #loadingSpinner { border: 4px solid #f3f3f3; border-top: 4px solid #3498db; border-radius: 50%; width: 20px; height: 20px; animation: spin 1s linear infinite; display: inline-block; margin-right: 10px; vertical-align: middle; display: none; }
        @keyframes spin { 0% { transform: rotate(0deg); } 100% { transform: rotate(360deg); } }
    </style>
</head>
<body>
    <div class="container">
        <h1>Creative Writing Assistant</h1>
        <textarea id="promptText" placeholder="Start your story here..."></textarea>
        <label for="maxTokens">Max Tokens to Generate:</label>
        <input type="number" id="maxTokens" value="50" min="10" max="250">
        <button id="generateButton">
            <span id="loadingSpinner"></span> Generate Text
        </button>
        <div id="output">
            <h3>Generated Story:</h3>
            <p id="generatedResult">Start typing and generate some creative text!</p>
        </div>
    </div>

    <script type="module" src="./app.js"></script>
</body>
</html>
// app.js
import { pipeline } from "https://esm.sh/@huggingface/transformers";

document.addEventListener('DOMContentLoaded', async () => {
    const promptText = document.getElementById('promptText');
    const maxTokensInput = document.getElementById('maxTokens');
    const generateButton = document.getElementById('generateButton');
    const generatedResult = document.getElementById('generatedResult');
    const loadingSpinner = document.getElementById('loadingSpinner');

    generatedResult.textContent = "Loading text generation model...";
    generateButton.disabled = true;
    loadingSpinner.style.display = 'inline-block';

    let textGenerator;
    try {
        // Using a smaller, faster model for in-browser text generation
        textGenerator = await pipeline(
            'text-generation',
            'Xenova/distilgpt2', // A distilled version of GPT-2
            {
                device: 'webgpu',
                dtype: 'q4',
            }
        );
        generatedResult.textContent = "Model loaded. Enter a prompt and generate text!";
        generateButton.disabled = false;
        loadingSpinner.style.display = 'none';
    } catch (error) {
        console.error("Failed to load text generation model:", error);
        generatedResult.textContent = "Error loading model. Check console.";
        loadingSpinner.style.display = 'none';
    }

    generateButton.addEventListener('click', async () => {
        const prompt = promptText.value.trim();
        const maxTokens = parseInt(maxTokensInput.value, 10);

        if (prompt === "") {
            generatedResult.textContent = "Please enter a starting prompt.";
            return;
        }
        if (isNaN(maxTokens) || maxTokens < 10 || maxTokens > 250) {
            generatedResult.textContent = "Max tokens must be between 10 and 250.";
            return;
        }

        generateButton.disabled = true;
        loadingSpinner.style.display = 'inline-block';
        generatedResult.textContent = "Generating...";

        try {
            const output = await textGenerator(prompt, {
                max_new_tokens: maxTokens,
                do_sample: true, // Enable sampling for more creative text
                temperature: 0.7, // Control randomness (lower = more predictable, higher = more creative)
                num_return_sequences: 1, // Number of alternative sequences to return
            });

            // The generated text includes the original prompt
            generatedResult.textContent = output[0].generated_text;

        } catch (error) {
            console.error("Error during text generation:", error);
            generatedResult.textContent = "Error during generation. Please try again.";
        } finally {
            generateButton.disabled = false;
            loadingSpinner.style.display = 'none';
        }
    });
});

3.2.3. Exercises/Mini-Challenges: Advanced Generation

  1. Adjust Generation Parameters: Experiment with do_sample, temperature, top_k, and top_p parameters in the textGenerator call.
    • temperature: Higher values (e.g., 1.0) make the output more random; lower values (e.g., 0.5) make it more focused.
    • top_k: Limits the number of highest probability words to consider for the next token.
    • top_p: Selects the smallest set of most probable words whose cumulative probability exceeds top_p.
    • Add UI controls (sliders or input fields) for temperature and top_k to your HTML.
  2. Multi-sequence Generation: Change num_return_sequences to 3 and display all three generated texts, perhaps in a list or separate paragraphs.
  3. Chatbot Persona: Try a different text generation model (e.g., search text-generation on Hugging Face Hub, Xenova/Qwen2-0.5B-Instruct). These models are often instruction-tuned. Give it an initial system prompt like “You are a helpful assistant who loves to tell jokes.” and see how it influences the generation. You might need to structure the prompt as [{ role: "system", content: "..." }, { role: "user", content: "..." }] for instruction-tuned models, passed directly to the generator.

3.3. Summarization

Text summarization condenses a longer text into a shorter, coherent version, retaining the most important information. This is valuable for quickly grasping the main points of articles, reports, or documents.

3.3.1. Detailed Explanation

Summarization models are typically encoder-decoder architectures trained to understand a long input sequence and generate a shorter output sequence. You can often control the length of the summary using min_new_tokens and max_new_tokens parameters.

3.3.2. Code Examples: Article Summarizer

<!-- index.html -->
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Article Summarizer</title>
    <style>
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; margin: 20px; background-color: #f4f7f6; color: #333; display: flex; flex-direction: column; align-items: center; }
        .container { background-color: #ffffff; padding: 30px; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); width: 100%; max-width: 700px; text-align: center; }
        h1 { color: #2c3e50; margin-bottom: 20px; }
        textarea { width: calc(100% - 20px); height: 180px; margin-bottom: 15px; padding: 10px; border: 1px solid #ddd; border-radius: 5px; font-size: 16px; resize: vertical; }
        label { display: block; margin-bottom: 5px; font-weight: bold; text-align: left; width: 100%; max-width: 680px; margin-left: 10px; }
        input[type="number"] { width: 80px; padding: 8px; border: 1px solid #ddd; border-radius: 4px; margin-bottom: 15px; margin-right: 15px;}
        button { background-color: #ff5722; color: white; padding: 12px 25px; border: none; border-radius: 5px; cursor: pointer; font-size: 17px; transition: background-color 0.3s ease; margin-bottom: 15px; }
        button:hover:not(:disabled) { background-color: #e64a19; }
        button:disabled { background-color: #cccccc; cursor: not-allowed; }
        #output { margin-top: 25px; padding: 20px; border: 1px solid #e0e0e0; border-radius: 8px; background-color: #e8f5e9; text-align: left; }
        #loadingSpinner { border: 4px solid #f3f3f3; border-top: 4px solid #ff5722; border-radius: 50%; width: 20px; height: 20px; animation: spin 1s linear infinite; display: inline-block; margin-right: 10px; vertical-align: middle; display: none; }
        @keyframes spin { 0% { transform: rotate(0deg); } 100% { transform: rotate(360deg); } }
    </style>
</head>
<body>
    <div class="container">
        <h1>Article Summarizer</h1>
        <textarea id="articleText" placeholder="Paste your article here..."></textarea>
        <div style="display: flex; justify-content: flex-start; align-items: center; width: 100%; max-width: 680px; margin-left: 10px; margin-bottom: 15px;">
            <label for="minLength" style="width: auto; margin-right: 5px;">Min Length:</label>
            <input type="number" id="minLength" value="30" min="10" max="100">
            <label for="maxLength" style="width: auto; margin-right: 5px;">Max Length:</label>
            <input type="number" id="maxLength" value="150" min="50" max="300">
        </div>
        <button id="summarizeButton">
            <span id="loadingSpinner"></span> Summarize Article
        </button>
        <div id="output">
            <h3>Summary:</h3>
            <p id="summaryResult">Paste an article and click "Summarize Article" to get a summary.</p>
        </div>
    </div>

    <script type="module" src="./app.js"></script>
</body>
</html>
// app.js
import { pipeline } from "https://esm.sh/@huggingface/transformers";

document.addEventListener('DOMContentLoaded', async () => {
    const articleText = document.getElementById('articleText');
    const minLengthInput = document.getElementById('minLength');
    const maxLengthInput = document.getElementById('maxLength');
    const summarizeButton = document.getElementById('summarizeButton');
    const summaryResult = document.getElementById('summaryResult');
    const loadingSpinner = document.getElementById('loadingSpinner');

    summaryResult.textContent = "Loading summarization model...";
    summarizeButton.disabled = true;
    loadingSpinner.style.display = 'inline-block';

    let summarizer;
    try {
        // Using a common summarization model. bart-large-cnn is effective but large.
        // For a lighter alternative, consider 'Xenova/t5-small' or 'Xenova/distilbart-cnn-6-6'
        summarizer = await pipeline(
            'summarization',
            'Xenova/distilbart-cnn-6-6', // Balance between quality and size
            {
                device: 'webgpu',
                dtype: 'q4',
            }
        );
        summaryResult.textContent = "Model loaded. Ready to summarize!";
        summarizeButton.disabled = false;
        loadingSpinner.style.display = 'none';
    } catch (error) {
        console.error("Failed to load summarization model:", error);
        summaryResult.textContent = "Error loading model. Check console.";
        loadingSpinner.style.display = 'none';
    }

    summarizeButton.addEventListener('click', async () => {
        const text = articleText.value.trim();
        const minLength = parseInt(minLengthInput.value, 10);
        const maxLength = parseInt(maxLengthInput.value, 10);

        if (text === "") {
            summaryResult.textContent = "Please paste an article to summarize.";
            return;
        }
        if (isNaN(minLength) || minLength < 10 || minLength > maxLength) {
            summaryResult.textContent = "Min length must be a number, greater than 10, and less than max length.";
            return;
        }
        if (isNaN(maxLength) || maxLength > 500) {
            summaryResult.textContent = "Max length must be a number and less than 500.";
            return;
        }

        summarizeButton.disabled = true;
        loadingSpinner.style.display = 'inline-block';
        summaryResult.textContent = "Summarizing...";

        try {
            const output = await summarizer(text, {
                min_new_tokens: minLength,
                max_new_tokens: maxLength,
            });
            summaryResult.textContent = output[0].summary_text;

        } catch (error) {
            console.error("Error during summarization:", error);
            summaryResult.textContent = "Error during summarization. Please try again.";
        } finally {
            summarizeButton.disabled = false;
            loadingSpinner.style.display = 'none';
        }
    });
});

3.3.3. Exercises/Mini-Challenges: Summarization Refinement

  1. Summarize a URL: Instead of pasting text, modify the app to take a URL as input. You would need a backend proxy (or a browser extension with more permissions) to fetch the content of the URL due to CORS restrictions in browsers. For this exercise, you can simulate it by providing a pre-fetched large block of text that a user could copy-paste from a URL. Focus on integrating a potentially much larger text input and ensuring the summarization handles it gracefully.
  2. Interactive Length Adjustment: As the user changes minLength and maxLength sliders, you could (for shorter texts) re-run the summarization dynamically (with a debounce to prevent too many calls). This provides a more interactive user experience.
  3. Extractive vs. Abstractive: Research the difference between extractive and abstractive summarization. While most transformers.js models are abstractive (generating new sentences), explore if there are any transformers.js-compatible models known for more extractive capabilities and try integrating one. Observe the difference in output.

3.4. Named Entity Recognition (NER)

Named Entity Recognition (NER) identifies and classifies named entities in text into pre-defined categories such as person names, organizations, locations, dates, etc.

3.4.1. Detailed Explanation

NER models scan text and assign tags to words or phrases that represent entities. For example, in “Tim Cook visited Apple Inc. in California,” the model would identify “Tim Cook” as a PERSON, “Apple Inc.” as an ORGANIZATION, and “California” as a LOCATION.

3.4.2. Code Examples: Entity Extractor

<!-- index.html -->
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Named Entity Recognition</title>
    <style>
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; margin: 20px; background-color: #f4f7f6; color: #333; display: flex; flex-direction: column; align-items: center; }
        .container { background-color: #ffffff; padding: 30px; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); width: 100%; max-width: 700px; text-align: center; }
        h1 { color: #2c3e50; margin-bottom: 20px; }
        textarea { width: calc(100% - 20px); height: 120px; margin-bottom: 15px; padding: 10px; border: 1px solid #ddd; border-radius: 5px; font-size: 16px; resize: vertical; }
        button { background-color: #673ab7; color: white; padding: 12px 25px; border: none; border-radius: 5px; cursor: pointer; font-size: 17px; transition: background-color 0.3s ease; margin-bottom: 15px; }
        button:hover:not(:disabled) { background-color: #512da8; }
        button:disabled { background-color: #cccccc; cursor: not-allowed; }
        #output { margin-top: 25px; padding: 20px; border: 1px solid #e0e0e0; border-radius: 8px; background-color: #f3e5f5; text-align: left; }
        #loadingSpinner { border: 4px solid #f3f3f3; border-top: 4px solid #673ab7; border-radius: 50%; width: 20px; height: 20px; animation: spin 1s linear infinite; display: inline-block; margin-right: 10px; vertical-align: middle; display: none; }
        @keyframes spin { 0% { transform: rotate(0deg); } 100% { transform: rotate(360deg); } }
        .entity { padding: 2px 5px; border-radius: 3px; font-weight: bold; margin-right: 5px; }
        .B-PER, .I-PER { background-color: #e0f7fa; color: #00838f; } /* Person */
        .B-ORG, .I-ORG { background-color: #ffe0b2; color: #ef6c00; } /* Organization */
        .B-LOC, .I-LOC { background-color: #c8e6c9; color: #388e3c; } /* Location */
        .B-MISC, .I-MISC { background-color: #ede7f6; color: #5e35b1; } /* Miscellaneous */
    </style>
</head>
<body>
    <div class="container">
        <h1>Named Entity Recognition (NER)</h1>
        <textarea id="nerText" placeholder="Enter text to extract entities (e.g., 'Google's CEO Sundar Pichai announced new AI initiatives in California')."></textarea>
        <button id="extractButton">
            <span id="loadingSpinner"></span> Extract Entities
        </button>
        <div id="output">
            <h3>Entities:</h3>
            <p id="nerResult">Enter text to see named entities.</p>
        </div>
        <div id="annotatedText" style="margin-top: 20px; padding: 15px; border: 1px dashed #ddd; background-color: #fdfdfd; text-align: left;">
            <h3>Annotated Text:</h3>
            <p id="annotatedResult">No text analyzed yet.</p>
        </div>
    </div>

    <script type="module" src="./app.js"></script>
</body>
</html>
// app.js
import { pipeline } from "https://esm.sh/@huggingface/transformers";

document.addEventListener('DOMContentLoaded', async () => {
    const nerText = document.getElementById('nerText');
    const extractButton = document.getElementById('extractButton');
    const nerResult = document.getElementById('nerResult');
    const annotatedResult = document.getElementById('annotatedResult');
    const loadingSpinner = document.getElementById('loadingSpinner');

    nerResult.textContent = "Loading NER model...";
    extractButton.disabled = true;
    loadingSpinner.style.display = 'inline-block';

    let nerPipeline;
    try {
        // Using a common NER model, often based on BERT or RoBERTa.
        nerPipeline = await pipeline(
            'ner',
            'Xenova/bert-base-multilingual-cased-ner-hrl', // A multilingual NER model
            {
                device: 'webgpu',
                dtype: 'q8', // Using q8 for a balance of speed and accuracy
            }
        );
        nerResult.textContent = "Model loaded. Enter text and extract entities!";
        extractButton.disabled = false;
        loadingSpinner.style.display = 'none';
    } catch (error) {
        console.error("Failed to load NER model:", error);
        nerResult.textContent = "Error loading model. Check console.";
        loadingSpinner.style.display = 'none';
    }

    extractButton.addEventListener('click', async () => {
        const text = nerText.value.trim();
        if (text === "") {
            nerResult.textContent = "Please enter some text to extract entities.";
            annotatedResult.textContent = "No text analyzed yet.";
            return;
        }

        extractButton.disabled = true;
        loadingSpinner.style.display = 'inline-block';
        nerResult.textContent = "Extracting entities...";
        annotatedResult.textContent = "Annotating text...";

        try {
            const entities = await nerPipeline(text);

            let resultHtml = '<ul>';
            let annotatedHtml = '';
            let lastIndex = 0;

            entities.sort((a, b) => a.start - b.start); // Ensure entities are sorted by start index

            entities.forEach(entity => {
                const label = entity.entity_group || entity.entity; // Use entity_group if available
                const word = entity.word;
                const score = (entity.score * 100).toFixed(2);

                resultHtml += `<li><span class="entity ${label}">${word}</span> - ${label} (Confidence: ${score}%)</li>`;

                // Add non-entity text
                annotatedHtml += text.substring(lastIndex, entity.start);
                // Add entity text with highlight
                annotatedHtml += `<span class="entity ${label}" title="Confidence: ${score}%">${word}</span>`;
                lastIndex = entity.end;
            });

            // Add any remaining text after the last entity
            annotatedHtml += text.substring(lastIndex);

            resultHtml += '</ul>';
            nerResult.innerHTML = entities.length > 0 ? resultHtml : "No entities found.";
            annotatedResult.innerHTML = annotatedHtml;

        } catch (error) {
            console.error("Error during NER:", error);
            nerResult.textContent = "Error during entity extraction. Please try again.";
            annotatedResult.textContent = "Error during annotation.";
        } finally {
            extractButton.disabled = false;
            loadingSpinner.style.display = 'none';
        }
    });
});

3.4.3. Exercises/Mini-Challenges: NER Application

  1. Visualize Entities Better: Enhance the annotatedText display. Instead of just highlighting, consider adding a small tooltip on hover (using the title attribute or a custom tooltip) that shows the entity type and confidence score.
  2. Filter by Entity Type: Add checkboxes or a dropdown to your UI allowing users to filter which types of entities (PERSON, ORG, LOC, MISC) are displayed in the nerResult list and highlighted in the annotatedText.
  3. Cross-Lingual NER: Find a multilingual NER model (Xenova/dslim-bert-base-NER or similar on Hugging Face Hub) and demonstrate its ability to extract entities from text in different languages (e.g., German, Spanish). You would need to provide example texts in those languages.