Automate PII Redaction From Audio Files Using Node.js And AssemblyAI
In the age of data privacy, redacting Personally Identifiable Information (PII) from audio and video files is a crucial task for many applications. A recent tutorial by AssemblyAI outlines how to automate this process using Node.js and the AssemblyAI API.
Understanding PII and Its Importance
PII includes any data that can be used to identify an individual, such as names, phone numbers, and email addresses. Handling this information is governed by regulations like HIPAA, GDPR, and CCPA. Redacting PII is essential in various applications, such as recording phone conversations between a doctor and a patient.
Setting Up the Development Environment
To begin, ensure you have Node.js 18 or higher installed. Create a new project folder, navigate to it, and initialize a Node.js project:
mkdir pii-redaction
cd pii-redaction
npm init -y
Modify the package.json file to use ES Module syntax by adding "type": "module"
. Next, install the AssemblyAI JavaScript SDK:
npm install --save assemblyai
You'll need an AssemblyAI API key, which can be obtained from the AssemblyAI dashboard. Set this key as an environment variable on your system:
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>
# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>
Transcribing Audio with PII Redaction
With the environment set up, you can start transcribing audio files. Create a file named index.js
and add the following code:
import { AssemblyAI } from 'assemblyai';
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
const transcript = await client.transcripts.transcribe({
audio: "https://storage.googleapis.com/aai-web-samples/architecture-call.mp3",
redact_pii: true,
redact_pii_policies: [
"person_name",
"phone_number",
],
redact_pii_sub: "hash",
});
if (transcript.status === "error") {
throw new Error(transcript.error);
}
console.log(transcript.text);
This script transcribes an audio file while redacting specified PII categories like names and phone numbers, replacing them with a hash.
Retrieving the Redacted Audio
To obtain the redacted audio, modify the code to include audio redaction settings:
import { AssemblyAI } from 'assemblyai';
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
const transcript = await client.transcripts.transcribe({
audio: "https://storage.googleapis.com/aai-web-samples/architecture-call.mp3",
redact_pii: true,
redact_pii_policies: [
"person_name",
"phone_number",
],
redact_pii_sub: "hash",
redact_pii_audio: true,
redact_pii_audio_quality: "mp3"
});
if (transcript.status === "error") {
throw new Error(transcript.error);
}
console.log(transcript.text);
This configuration ensures that the redacted audio is available in MP3 format. The redacted audio file can be downloaded using the following code:
import { writeFile } from "fs/promises";
const { redacted_audio_url } = await client.transcripts.redactions(transcript.id);
const redactedFileResponse = await fetch(redacted_audio_url);
await writeFile("./redacted-audio.mp3", redactedFileResponse.body);
Executing the Script
Run the script in your shell:
node index.js
If successful, the console will display the redacted transcript, and a redacted audio file will be saved to your disk. The tutorial also provides an example of an unredacted transcript for comparison.
Conclusion
By following this tutorial, developers can efficiently redact PII from audio and video files using AssemblyAI and Node.js. For more details, visit the AssemblyAI blog.
Image source: Shutterstock
. . .
Tags
Ether Surges 16% Amid Speculation Of US ETF Approval
New York, USA – Ether, the second-largest cryptocurrency by market capitalization, experienced a significant surge of ... Read more
BlackRock And The Institutional Embrace Of Bitcoin
BlackRock’s strategic shift towards becoming the world’s largest Bitcoin fund marks a pivotal moment in the financia... Read more
Robinhood Faces Regulatory Scrutiny: SEC Threatens Lawsuit Over Crypto Business
Robinhood, the prominent retail brokerage platform, finds itself in the regulatory spotlight as the Securities and Excha... Read more
Ethereum Lags Behind Bitcoin But Is Expected To Reach $14K, Boosting RCOF To New High
Ethereum struggles to keep up with Bitcoin, but experts predict a rise to $14K, driving RCOF to new highs with AI tools.... Read more
Ripple Mints Another $10.5M RLUSD, Launch This Month?
Ripple has made notable progress in the rollout of its stablecoin, RLUSD, with a recent minting of 10.5… Read more
Bitcoin Miner MARA Acquires Another $551M BTC, Whats Next?
Bitcoin mining firm Marathon Digital Holdings (MARA) has announced a significant milestone in its BTC acquisition strate... Read more