2024-09-09 web, development, javascript

Script to Process Phrases with OpenAI

By O. Wolfson

Introduction

This tutorial guides you through creating a Python script that processes Italian phrases from a JSON file, obtains explanations using OpenAI's GPT-3 model, and saves these explanations in another JSON file. A key aspect of this project is the use of a virtual environment for better dependency management and project isolation.

Prerequisites

  • Basic understanding of Python.
  • An OpenAI API key.
  • Python environment with virtualenv installed.
  • A JSON file with Italian phrases.

Step 1: Setting Up a Virtual Environment

Before starting, it’s crucial to set up a virtual environment. This keeps your project dependencies separate from your global Python installation.

  1. Create a Virtual Environment:

    bash
    python -m venv openai-env
    

    This command creates a new virtual environment named openai-env.

  2. Activate the Virtual Environment:

    • On Windows:
      bash
      openai-env\Scripts\activate
      
    • On macOS and Linux:
      bash
      source openai-env/bin/activate
      
  3. Install Required Packages: With the environment activated, install the openai package:

    bash
    pip install openai
    

Step 2: Importing Libraries

In your Python script, import the necessary libraries:

python
import openai
import json
from tqdm import tqdm
import time

Step 3: OpenAI API Key Configuration

Set your OpenAI API key:

python
openai.api_key = "your-api-key"

You can get an API key from OpenAI. Sign up for an account and create a new API key. This may require you to enter your credit card information and pay a small fee, depending on the account type and your usage.

Step 4: Reading Input Data

Load your JSON file containing the Italian phrases:

python
with open("italian-language-phrases.json", "r") as file:
    phrases = json.load(file)

See the JSON data file used in this example.

Step 5: Preparing for Phrase Processing

Check for an existing explanations file. If not found, create an empty list:

python
try:
    with open("italian-phrase-explanations.json", "r") as file:
        explanations = json.load(file)
except FileNotFoundError:
    explanations = []

Step 6: Progress Tracking and Time Estimation

Utilize the tqdm library for a progress bar:

python
total_phrases = len(phrases)
estimated_time = total_phrases * 30  # Assuming 30 seconds per phrase

print(f"Starting to process {total_phrases} phrases. Estimated time: {estimated_time//60} minutes {estimated_time%60} seconds.\n")

for item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
    # ...processing logic here...

Step 7: Processing Phrases and Storing Explanations

Inside the loop, use OpenAI to get explanations for each new phrase:

python
for item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
    # Skip already processed phrases
    if any(exp["id"] == item["id"] for exp in explanations):
        continue

    phrase = item["phrase"]
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": f"Explain the Italian phrase: {phrase}"}],
    )

    explanation = {"id": item["id"], "explanation": completion.choices[0].message.content}
    explanations.append(explanation)

    with open("italian-phrase-explanations.json", "w") as file:
        json.dump(explanations, file, indent=4)

Step 8: Finalizing the Script

Once all phrases are processed, output a completion message:

python
print("COMPLETE! Explanations written to italian-phrase-explanations.json")

Step 9: Running the Script

Run the script from your terminal:

bash
# On Windows:
python process-phrases.py

# On macOS and Linux:
python3 process-phrases.py

output:

Check out the phrase explanations here. Note that the each object contains markdown that can be rendered as HTML.

Complete Python Script Code

python
import openai
import json
from tqdm import tqdm
import time

# OpenAI API key
openai.api_key = "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"

# Read the input JSON containing Italian phrases
with open("italian-language-phrases.json", "r") as file:
    phrases = json.load(file)

# Check if explanations file already exists, if not, initialize an empty list
try:
    with open("italian-phrase-explanations.json", "r") as file:
        explanations = json.load(file)
except FileNotFoundError:
    explanations = []

# Estimate Time Remaining
total_phrases = len(phrases)
estimated_time = total_phrases * 30  # Assuming 30 seconds per phrase
print(
    f"Starting to process {total_phrases} phrases. Estimated time: {estimated_time//60} minutes {estimated_time%60} seconds.\n"
)

# Progress bar using tqdm
for item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
    # If the item's explanation already exists, skip it
    if any(exp["id"] == item["id"] for exp in explanations):
        continue

    # Printing progress updates
    i = phrases.index(item) + 1
    print(f"\nProcessing phrase {i} of {total_phrases}...")

    phrase = item["phrase"]

    # Query OpenAI for an explanation
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "user",
                "content": f"Give me a brief linguistic / grammatical breakdown of the following Italian phrase (formatted in markdown): {phrase}. Please do not address the question or questioner in your response. Just deliver the explanation itself, as the entire text response will be used directly in some documentation. please format each explanation so that all Italian words are in **bold** and the corresponding English are *bold** words as well. The explanation should be in markdown format.",
            }
        ],
    )

    # Append the explanation to the explanations list
    explanation = {
        "id": item["id"],
        "explanation": completion.choices[0].message.content,
    }
    explanations.append(explanation)

    # Write the current explanation to the JSON file immediately
    with open("italian-phrase-explanations.json", "w") as file:
        json.dump(explanations, file, indent=4)

    print(f"Processed and saved explanation for phrase {i}.\n")

print(f"COMPLETE! Explanations written to italian-phrase-explanations.json")

Conclusion

By using a virtual environment, this script provides a reliable and isolated way to process Italian phrases with OpenAI’s API. This method is essential for maintaining a clean and conflict-free development environment.

Additional Tips

  • Deactivate your virtual environment when you're finished by typing deactivate in your terminal.
  • Consider maintaining a requirements.txt file for easy setup of the environment on different machines.
  • Regularly update your dependencies to catch up with the latest versions and security patches.

This article now includes a complete guide on setting up a virtual environment for your Python project, ensuring a more organized and efficient development process, especially when integrating powerful tools like OpenAI's API.