In this post, I’ll walk you through the steps I took to build a system that ingests weather text files received from the GOES‑16 satellite via my custom homebrew ground station. These texts are part of the EMWIN package (Emergency Managers Weather Information Network), a system used by emergency managers and meteorologists to distribute real‑time weather information and warnings.
My setup processes these EMWIN weather texts, stores them in a MySQL database, and leverages OpenAI’s GPT model to generate human‑friendly summaries. Designed to run inside Docker containers with cron jobs handling regular updates, this guide will help you set up a similar workflow for your own weather data projects.
My setup processes these EMWIN weather texts, stores them in a MySQL database, and leverages OpenAI’s GPT model to generate human‑friendly summaries. Designed to run inside Docker containers with cron jobs handling regular updates, this guide will help you set up a similar workflow for your own weather data projects.
Overview
GOES‑16 provides full‑disk weather images and associated text products via an AWS S3 bucket. My goal was to:
Fetch .TXT weather files from an S3 bucket (organized by date) for a specific zone.
Below, I detail each of these steps.
- Extract the forecast text for that zone from the file.
- Parse and split the forecast into day segments (e.g., “TODAY”, “TONIGHT”, “SATURDAY”, etc.).
- Convert UTC dates to local time (CST) based on the forecast’s issuance timestamp so that North American users see the correct day.
- Insert the processed forecast segments into a MySQL database using an upsert strategy.
- Generate concise summaries using OpenAI’s GPT models.
- Schedule regular updates via cron running in a Docker container.
Below, I detail each of these steps.
Fetching Weather Texts from S3
My GOES‑16 data is stored in an S3 bucket under a prefix folder like kmob-texts/2025-02-07/. I used the AWS SDK for PHP to list and retrieve files. Read: Automating Data Uploads with Dynamic Directory Selection in Bash - GOES-16
Key Steps:
Key Steps:
- List Date Folders: I used the S3 API’s listObjectsV2 with a delimiter (/) to list folders corresponding to dates.
- Select a Date: The system checks a GET parameter or defaults to today’s date (or the most recent available date).
- Identify the Target File: I then look for the “Zone Forecast Product” file (e.g., a file named like A_FPUS54KMOB071445_C_KWIN_20250207144511_831106-3-ZFPMOBAL.TXT) in that folder. The above file is specific to the Mobile, AL region and includes several Gulf Coast areas including Mobile County, Baldwin County, Washington County, Escambia County etc. The data I extract is further specific to Baldwin Central which includes the cities of Bay Minette, Daphne, Fairhope, Foley, and Spanish Fort.
// Instantiate S3 client.
$s3 = new S3Client([
'version' => 'latest',
'region' => 'us-east-1',
]);
$bucket = 'your_s3_bucket';
$basePrefix = 'kmob-texts/';
$today = date('Y-m-d');
$selectedDate = $_GET['date'] ?? $today;
// List available date folders.
$result = $s3->listObjectsV2([
'Bucket' => $bucket,
'Delimiter' => '/',
'Prefix' => $basePrefix,
]);
$availableDates = [];
if (isset($result['CommonPrefixes'])) {
foreach ($result['CommonPrefixes'] as $prefixData) {
$folder = str_replace([$basePrefix, '/'], '', $prefixData['Prefix']);
if (!empty($folder)) {
$availableDates[] = $folder;
}
}
}
rsort($availableDates);
if (!in_array($selectedDate, $availableDates, true)) {
$selectedDate = !empty($availableDates) ? $availableDates[0] : $today;
}
$datePrefix = $basePrefix . $selectedDate . '/';
$result = $s3->listObjectsV2([
'Bucket' => $bucket,
'Prefix' => $datePrefix,
]);
// Find the Zone Forecast Product file.
$targetKey = null;
foreach ($result['Contents'] as $object) {
$basename = basename($object['Key']);
if (preg_match('/^A_FPUS54KMOB\d{6}_C_KWIN_\d{14}_\d+-\d+-ZFPMOBAL\.TXT$/i', $basename)) {
$targetKey = $object['Key'];
break;
}
}
Handling the Time Zone Issue
GOES‑16 operates on UTC, but North American (Gulf Coast) users expect local time (CST). My solution was to extract the issuance timestamp from the forecast file and convert it to a local date.
How I Did It:
Extract Issuance Timestamp:
The forecast file contains a line like:
Example code for parsing and conversion:
How I Did It:
Extract Issuance Timestamp:
The forecast file contains a line like:
941 PM CST Fri Feb 7 2025
- I used a regular expression to locate this line and PHP’s DateTime::createFromFormat() to parse it. Convert & Override Date:
- Since the issuance string already includes “CST,” I extract the local date and override the selected date with this value.
Example code for parsing and conversion:
if (preg_match('/(\d{1,2}\d{2}\s?(AM|PM)\s+[A-Z]{3}\s+[A-Za-z]{3}\s+\d{1,2}\s+\d{4})/i', $fileContent, $timeMatches)) {
$issuanceStr = trim($timeMatches[1]);
$issuedAt = DateTime::createFromFormat('gia T D M j Y', $issuanceStr);
if ($issuedAt) {
$localForecastDate = $issuedAt->format('Y-m-d');
$selectedDate = $localForecastDate;
}
}
Parsing and Splitting the Forecast Text
After fetching the file content and determining the correct local date, I extract the forecast for my zone (e.g., ALZ264) and split it into day segments.
I also replace "REST OF TODAY" and "REST OF TONIGHT" with "TODAY" and "TONIGHT", respectively.
- Extract the Forecast Section: Using a regular expression to extract the block of text for ALZ264.
- Split into Segments: Each segment starts with a line like .REST OF TODAY... or .TONIGHT.... I then split the text into an array of segments.
if (preg_match('/^(ALZ264[^\n]*\n(?:.*\n)*?)(?=^[A-Z]{3}\d{3}-|\$\$)/m', $fileContent, $matches)) {
$extractedZoneForecast = trim($matches[1]);
} else {
$extractedZoneForecast = 'Zone forecast for ALZ264 not found in the file.';
}
$dayForecasts = [];
if (preg_match_all('/\.(?[A-Z ]+)\.\.\.(?.*?)(?=\.[A-Z ]+\.\.\.|$)/s', $extractedZoneForecast, $matches)) {
foreach ($matches['day'] as $i => $day) {
$dayForecasts[] = [
'day' => trim($day),
'forecast' => trim($matches['forecast'][$i])
];
}
}
I also replace "REST OF TODAY" and "REST OF TONIGHT" with "TODAY" and "TONIGHT", respectively.
Ingesting Data into MySQL (Using Upsert)
I use PHP’s PDO to insert the forecast segments into my MySQL database. Using an upsert strategy ensures that if multiple forecast files are received in a day (e.g., a morning and an afternoon update), only the most recent data is stored.
Database Schema Example:
Insertion Code Snippet:
Database Schema Example:
CREATE TABLE zone_forecasts (
id INT AUTO_INCREMENT PRIMARY KEY,
zone VARCHAR(10) NOT NULL,
forecast_date DATE NOT NULL,
forecast_day VARCHAR(50) NOT NULL,
forecast_text TEXT NOT NULL,
issued_at DATETIME NULL,
UNIQUE KEY uniq_forecast (zone, forecast_date, forecast_day)
);
Insertion Code Snippet:
$zoneCode = 'ALZ264';
$insertMessages = [];
$dsn = "mysql:host=" . DB_HOST . ";dbname=" . DB_NAME . ";charset=utf8mb4";
$pdo = new PDO($dsn, DB_USER, DB_PASS);
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$stmt = $pdo->prepare("
INSERT INTO zone_forecasts (zone, forecast_date, forecast_day, forecast_text)
VALUES (:zone, :forecast_date, :forecast_day, :forecast_text)
ON DUPLICATE KEY UPDATE forecast_text = VALUES(forecast_text)
");
foreach ($dayForecasts as $dayForecast) {
$forecast_day = trim($dayForecast['day']);
$forecast_day = str_replace(
['REST OF TODAY', 'REST OF TONIGHT'],
['TODAY', 'TONIGHT'],
$forecast_day
);
$forecast_text = trim($dayForecast['forecast']);
$stmt->bindParam(':zone', $zoneCode);
$stmt->bindParam(':forecast_date', $selectedDate);
$stmt->bindParam(':forecast_day', $forecast_day);
$stmt->bindParam(':forecast_text', $forecast_text);
$stmt->execute();
$insertMessages[] = "Inserted/Updated forecast for {$forecast_day}";
}
$finalInsertMessage = "Operation complete: " . implode(", ", $insertMessages);
error_log($finalInsertMessage);
echo $finalInsertMessage;
Generating Summaries with OpenAI
Once the forecast data is in the database, I built a view that retrieves the data and uses my OpenAIChatbot class to generate summaries. I structured the prompt using a heredoc format with clear instructions and an “IMPORTANT” section to guide the summarization.
Prompt Structure Example:
A similar prompt is built for upcoming days. The OpenAIChatbot class sends these prompts to the API, and the responses are then stored/displayed.
Prompt Structure Example:
Prompt Structure Example:
$promptToday = <<<PROMPT
You are a meteorologist specializing in weather forecast analysis for zone ALZ264.
Below is the detailed forecast for TODAY:
{$todayText}
Provide a structured summary in standard paragraphs with the following sections:
Observations:
Summarize the key weather conditions, including temperature, wind, and precipitation details for today.
Trends:
Highlight any noticeable trends or changes expected during the day.
Recommendations:
Offer any advisories or recommendations for today (for example, warnings about dense fog or high winds).
IMPORTANT:
- Do not use asterisks, bullet points, or dashes unless they are part of numeric data or proper names.
- Write in standard paragraphs.
- Keep your response succinct.
PROMPT;
A similar prompt is built for upcoming days. The OpenAIChatbot class sends these prompts to the API, and the responses are then stored/displayed.
Scheduling with Cron in Docker
I ran the entire ingestion script inside a Docker container. By installing cron in the container and setting up a cron job (e.g., using a crontab entry like 0 * * * * php /var/www/html/zone_forecast_insert.php >> /var/log/zone_forecast.log 2>&1), the system automatically ingests new forecast files as they arrive.
NOTE: The above script location was for my testing purposes only. Store your CLI-only scripts in a directory that isn't accessible via the web server, and then reference them in your cron job.
For debugging, I verified that cron was running by exec’ing into the container and checking the process list (ps aux | grep cron) and by reviewing the cron log file.
NOTE: The above script location was for my testing purposes only. Store your CLI-only scripts in a directory that isn't accessible via the web server, and then reference them in your cron job.
For debugging, I verified that cron was running by exec’ing into the container and checking the process list (ps aux | grep cron) and by reviewing the cron log file.
Conclusion
By combining these approaches, we built a scalable, automated system that:
- Fetches GOES‑16 weather texts from an S3 bucket,
- Correctly groups and processes forecast data based on the local issuance timestamp,
- Inserts/upserts the forecast data into a MySQL database,
- Uses OpenAI to generate clear, structured summaries,
- And schedules the entire process using cron inside a Docker container.