Meteorology Ain't Easy

Whenever a forecast doesn't pan out as expected, many non-meteorologists are quick to criticize meteorologists for getting it wrong. Often times, meteorologists will offer up a rebuttal to such criticisms by explaining that weather forecasting and meteorology is challenging. Usually, the case for this is made by showing some complicated looking equations, but I'm going to attempt to offer up more of a practical explanation of why forecasting is a challenge. Sure, I could go through all the vector calculus that is involved with meteorological equations. Sure, I could talk about the complicated equations like the quasi-geostrophic equations, the vorticity tendency equation, and the atmospheric equations of motion, but so many people have already offered these up as a defense. I find this counterargument weak, because there are many other professions that involve complex math, but those professions aren't the subject of regular complaints about inaccuracies. My goal will be to explain, from a more practical standpoint, why forecasting the weather is so difficult even in this modern day and age, and it has little to do with the high-level math.

Part One: Observations are not perfect

Anyone who has ever looked at a weather map on television probably knows (in the back of their mind at least) that there is a vast array of weather observation stations set up all around the country. Sure enough, these observation stations are often important when handling a significant weather event on the day of the event, and, to some extent, the day before an event. What many people don't realize is that these observations always contain some degree of error, and they don't always tell the whole story.

I remember working a flash flooding event in the Central Plains, and I was using observational data to pinpoint where a surface front is. After all, the positioning and orientation of the front will determine where the heaviest rainfall was likely to occur. When going through the surface observations, there was one station in Oklahoma that was reporting a temperature of 97 °F and a dewpoint of 97 °F. For perspective, a dewpoint temperature of 97 °F would be a world record. Was the dewpoint actually 97 °F? Of course not. Do computers know the data is not valid? No. In fact, the computer-generated plots were showing a ridiculously high amount of moisture around this one observation station, because it is hard for computer algorithms to reliably flag bad data.

Alright, so that's an admittedly outlandish example. The vast majority of weather stations are not that inaccurate. However, all weather stations have some degree of inaccuracy in the values that they're reporting. Let's consider an example of temperature where the reported temperature outside is 33 °F. In actuality, the outside temperature is not 33 °F. It might be something like 33.01 °F, 32.99 °F, or even 33.000001 °F. It's statistically impossible for the temperature to be exactly 33 °F. It also turns out that most thermometers have an observational error of ± 2 °F. This means, if your reported temperature is 33 °F, there is a 95% chance that the actual temperatures lies between 31 °F and 35 °F. If the observational error is ± 3 °F and the reported temperature is 33 °F, then there is a 95% chance that the actual temperature lies between 30 °F and 36 °F. Being off by a few degrees may not seem like a big deal, and it usually isn't, unless you're dealing with a potential winter weather event.

Let's say that you've got rain falling and the outside temperature is reported to be 31 °F (± 2 °F). Let's also say that you're expecting widespread 1.00"+ rain totals in this same general region. Applying the same logic as above, there is 95% chance that the actual temperature is between 29 °F and 33 °F. That means, there's a reasonable chance the air temperature is actually above freezing, despite the fact that it's reporting a temperature below freezing. If the temperature outside truly is 31 °F, then you can expect a catastrophic freezing rain event. If the temperature outside truly is above freezing, it's a forgettable cold rain. Now imagine trying to use that observation to make a forecast. If you predict a catastrophic freezing rain event, but the actual air temperature is above freezing, you're going to look really silly. But, if you say it's just a cold rain and the temperature is actually below freezing, everyone will blame you for not properly informing them. Also, that doesn't account for the 5% chance that the temperature could be greater than 33 °F or less than 29 °F.

To make matters even more interesting, it is impossible to know (with absolute certainty) what the weather conditions are at every point on the planet. To help illustrate this, let's consider two observation stations in Kansas. One station is located in Wichita, and the other station is located in Newton (about 25 miles north of Wichita). Let's say you want to know what the temperature is in Sedgwick, which is somewhere between Wichita and Newton. Well, there's no observation station in Sedgwick, so how will you determine the temperature there? In short, there is no way to know for sure. You can make an educated guess on what the temperature is there, but you cannot know with absolute certainty. In fact, even if you had a weather station there, there is no guarantee that it will be reporting the correct temperature (detailed in the above paragraph).

This may not seem like a big deal for surface observations since there are 1000s of weather stations out there reporting weather conditions on an hourly basis (some report more often than hourly). And, it turns out, we often get a very good picture of what's happening at ground level. However, things become very problematic when trying to obtain upper air observations. The only consistent method for obtaining upper air observations is by lauching weather balloons. Some aircraft also record what's happening, but they're only reporting what's happening at their specific altitude and position. Many aircraft are en route to somewhere else, so the observations they're reporting will only be a single level in the atmosphere and will be strung out over some distance. But, in regard to weather balloons, on any given day, these are launched every 12 hours from around 100 stations across the United States. This raises a real problem, because the resolution of upper air observations is very poor, thus a lot of important details can easily be missed. Also, these are launched every 12 hours. A lot can happen in the upper atmosphere during that timeframe that will influence the overall weather pattern.

A curious reader might ask "Why not just launch more weather balloons?" Weather balloons and the instrumentation they carry are expensive. Each launch requires about $1000 worth of balloon, instrumentation, and helium. Launching 100 balloons twice a day already gives you a daily cost of $200,000. Recall that surface observations are taken hourly and there are 1000s of them. To get an upper air observation for every single ground observation would cost over $20,000,000 per day.

Dealing with the gaps in upper air observations is challenging, and also a major problem. What happens in the upper atmosphere can heavily influence what's going to happen at the ground. For instance, if the upper air winds are all blowing in the same direction and are running parallel to a surface front, there could a major threat for flash flooding. If your winds are veering and intensifying with height, you might have to worry about severe thunderstorms and tornadoes. If you've got a layer of air above freezing on top of a subfreezing layer near the ground, you might have to worry about freezing rain and sleet. If you've got a very warm and dry layer right above the ground, you might have to worry about fog. And, these are merely the weather conditions that concern those on the ground. What about conditions that matter to aviation? Where is the jet-stream that will improve (or worsen) an airplane's fuel efficiency? Where is the worst turbulence located? Where might we have to worry about an airplane's wings getting iced over? And, perhaps most importantly, airplanes need to know what the vertical temperature profile looks like in order to properly tune their altimeters.

There are two very common techniques to help fill in the missing information that is the upper air data. One is to use satellite imagery to try and determine what the winds are like at different levels of the atmosphere. However, it is often difficult to determine what the temperature is in the upper atmosphere using only satellite data. Another technique is to use short-term model forecasts to predict what the upper atmosphere could look like. But, in the following section, we'll see how models are not a perfect solution either.

Part Two: Models are not perfect

When forecasting the weather 48+ hours in advance, the only viable option is to use computer models to try and determine what the state of the atmosphere will be at some future time. Now, there are a lot of misconceptions surrounding weather forecast models, so let's just go ahead and address those right now.

(1) It is currently impossible to exactly resolve the state of the atmosphere. As powerful as today's computers are, it is still impossible for us to resolve every single nitty gritty detail of the atmosphere. If you want a model with nearly perfect accuracy, you would have to start worrying about microphysical properties of matter and get down to the molecular level to determine how individual air molecules interact with each other. That means, we would need a model grid with a resolution of nanometers. Most global models currently operate on a 24km grid resolution. That is, every grid point on the model is a 24km x 24km box, and even that is a challenge for modern day computers to resolve in a timely manner. To get a better idea on what exactly we're talking about, here's a rough estimate on how much memory is needed to handle the current global models:

(1000 latitude grid points) * (1000 longitude grid points) * (30 vertical grid points) * (60 timestamps) * (64 bits / data point) = 13.411 GB

And that's for one variable on one global model. For perspective, the average desktop computer has about 16 GB of RAM, and a lot of that is already used by the operating system.

Let's repeat the above calculation again, but let's see how much data is needed for a nanometer grid resolution. To get this level of resolution, we would need roughly 40,003,600,000,000,000 grid points in both the latitude and longitude coordinates and roughly 10,000,000,000,000 grid points in the vertical direction. And, if you wanted to resolve that for every microsecond over the same length of time for a global model (10 days), you would have about 864,000,000,000 different timestamps. Let's see how much memory a computer would need to perform this calculation:

(40,003,600,000,000,000 latitude grid points) * (40,003,600,000,000,000 longitude grid points) * (864,000,000,000 timestamps) * (64 bits / data point) = 1.059 x 10³² YB (Yottabytes) or 1.164 x 10⁴⁷ GB.

If your average desktop computer contained 16 GB of RAM, you would need 7,280,000,000,000,000,000,000,000,000,000,000 desktop computers to have enough memory. If you take the world population to be around 7 billion, that would mean you could give 1,040,000,000,000,000,000,000,000 desktop computers to everyone on the planet. But, we still haven't gotten to the best part yet! That still wouldn't perfectly resolve the physics of the atmosphere, because there are limits to how precise a computer can be when storing individual numbers in its memory.

In short, there is simply no way we can possibly resolve the fine details needed to get a nearly perfect prediction from a computer model. That is why most numerical models use techniques to roughly estimate the microphysical properties of the atmosphere and its contents. Of course these estimations mean we'll be sacrificing accuracy for memory conservation. But, if we want any model output at all, it's a necessary sacrifice. That is one reason why all forecast models produce slightly different outputs. Not only are the grid resolutions different, but the techniques used to estimate microphysical properties are also different.

(2) It is currently impossible for computers to exactly evaluate certain mathematical expressions. For simple math (and even algebra), computers generally do a good job of exactly resolving what the answers are. However, the equations that govern the atmosphere involve calculus. And, calculus expressions are currently impossible for computers to resolve exactly. Their solutions have to be approximated. Approximations naturally introduce some degree of error, which becomes amplified as the model gets deeper into the calculations that follow. Anyone who has taken a college level meteorology class knows it's hard to find an equation that doesn't contain some form of calculus. So, even if you had the memory and processing power to handle all of the data, you still wouldn't get a completely accurate solution, because there is a degree of error that is unavoidable.

(3) The data being plugged into models contains errors. Remember the previous section talking about how observations contain some degree of error? Well, at some point that observational data has to be plugged into a model. If you plug flawed data into a model, then the result you get back from the model will also contain flaws, even if it perfectly models the actual physical process. Consider the following: You're embarking on a road trip from Dallas to Oklahoma City, two cities separated by a distance of roughly 200 miles. The speed-limit on I-35 is about 70 mph (it changes as you cross the Red River and go through construction zones). You take a reading from your speedometer, which is going to be inaccurate to some degree. It tells you 70 mph, but let's say your actual speed is 71.25 mph. If you then go to calculate an ETA:

ETA (measured): (200 miles) / (70 miles per hour) = 2.857 hours
ETA (actual): (200 miles) / (71.25 miles per hour) = 2.807 hours

So, your actual ETA might be off by as much as 3 minutes. That may not seem like much, but there's a small error there, which will become more amplified with time. Hopefully this will be illustrated if we consider a distance of 1000 miles instead of 200 miles.

ETA (measured): (1000 miles) / (70 miles per hour) = 14.286 hours
ETA (actual): (1000 miles) / (71.25 miles per hour) = 14.035 hours

Driving for a longer time, your ETA is now off by 15 minutes. What started as a small error now becomes more noticeable. It doesn't matter how perfect your model is. If the data you put into the model contains errors, the results will also contain errors. But, to make things interesting, weather models aren't perfect, and they take in data that isn't perfect, meaning their outputs can contain significant errors.

Alright, so just using models isn't going to be accurate and just using observations isn't going to be accurate. That means meteorologists are pretty much set up for failure with every single forecast. The model is wrong to some degree and the observations are wrong to some degree. How is it meteorologists get anything right at all? Fortunately, there are techniques that can be used to help minimize the error of a forecast.

One technique involves using as many as sources of data as possible. If they are all in reasonable agreement, then there is a high degree of likelihood (close to but not quite 100%) that the outputs can be trusted. What happens when the observations and models disagree? Or, what happens even more often, what happens when the individual models disagree with each other (there are currently over 25 different forecast models in operation as of this writing)? Which model is correct? What if none of them appear to be correct? What if the models are at polar opposites (e.g. one calls for a major tornado outbreak and the other says sunny skies)? Also, there is a limit to how much information a human can process in a timely and efficient manner. As such, adding more sources of information is not necessarily going to result in a more accurate forecast.

Another popular technique is to use ensembles. An ensemble basically takes many different individual model runs and calculates an average between all of the models. As an example, suppose you have 5 models predicting a rain event for Birmingham, Alabama. One model predicts 0.25", one predicts 0.00" (no rain), one predicts 2.00", one predicts 1.25", and the other predicts 0.75". The ensemble average would be 0.85". But, there's one thing that should be noticed right away: One of the models said no rain at all, meaning there's a chance that the ensemble average itself is very wrong. Let's consider a similar scenario between 5 model runs: 0.75", 0.80", 0.85", 0.90", 0.95". The ensemble mean in this scenario is also 0.85", but every single model is calling for at least 0.75" of rain. In this case, the range of possible outcomes (assuming the models are reasonably accurate) is much narrower than the first case. This usually means there is more "confidence" in a forecast, while a larger "spread" in the ensemble suggests a low "confidence" forecast. For any given weather event, though, there is often a wide range of possible outcomes and it's not always easy to know which outcome is the most probable.

Part Three: Hazard-Specific Challenges

I. Severe Thunderstorms and Tornadoes

        a. Threats for large hail, damaging winds, and tornadoes heavily depend on storm configuration. If you've ever read any technical forecast discussions, you've probably encountered the terms "storm mode", "linear mode", or "isolated mode" before. Simply put, this refers to the characteristics of the storms themselves. An isolated storm is a storm that is all by itself with no other precipitation around it. A linear storm can be thought of as a "line of storms" or a continuous line of precipitation. Isolated storms are more likely to produce strong to violent tornadoes and very large hail, while linear storm modes are much more likely to produce straight-line wind damage and flooding. As such, determining storm mode is important when making a forecast for severe weather. Now, there are clear-cut cases of environments that are favorable for isolated modes and environments that are favorable for linear modes, but some cases are not this way. In fact, one important factor for determining linear vs isolated modes is the strength of upward motions in the atmosphere. This is not something that can be directly measured; it can only be estimated. If you underestimate the strength of vertical motions, you might get linear modes when you expected isolated modes, and thus your forecast for a tornado outbreak might bust. Likewise, if you overestimate the strength of vertical motions, you might get isolated modes, and thus you might end up with a "surprise tornado outbreak" or maybe even no storms at all.

        b. Storm interactions and small scale details can be important, but are difficult for models to resolve. When storms go up on a potential severe weather day, sometimes they change the behavior of other storms if they happen to be "nearby". One of the better examples of this was a tornado event that happened on April 3rd, 2012. A complex of thunderstorms had formed in Oklahoma and had dropped a current of cold air (known as a "cold pool" or "outflow boundary"). This current of cold air did not show up on forecast models because these currents are very difficult for models to predict. As it turns out, that "outflow boundary" ended up stalling just south of the Dallas/Fort Worth metroplex. Two supercell thunderstorms would form along that boundary just after lunchtime and each produced a large tornado in a very densely populated area. What began as a run-of-the-mill severe weather day with a relatively harmless squall line ended up becoming one of the more significant tornado events to occur in the Dallas/Fort Worth area, and it was all because of a feature that the models never expected to be there. A similar "surprise outbreak" (in the same area) also happened on May 15th, 2013 when an EF4 tornado killed six people in Granbury, TX.

        c. Severe weather events often happen on a very localized level, leading to very high false alarm rates. Whenever a forecast is made for tornadoes and severe thunderstorms, that forecast generally involves drawing a large area where conditions are expected to be favorable for dangerous storms. Such areas might cover 100,000s of square miles and include millions of local residents. Do tornadoes cover 100,000s of square miles? Of course not; they're tiny compared to these risk areas. This means that 99% of the people in the risk area are not affected by a tornado, and that often holds true even for a major tornado event like April 27th, 2011. Think about that for a minute: You've drawn a risk area and, in doing so, you've told 1,000,000 people they might be hit by a tornado today. Once the day is over, "only" 1000 people were actually affected. That means you've sounded a "false alarm" for 999,000 people. In fact, if you're forecasting a tornado outbreak over a general region, there is a good chance none of them get any rain at all, because most tornado outbreaks involve isolated storms.

II. Winter Weather Events

        a. Outcome can be extremely sensitive to many different details. Remember the example we looked at concerning how observations are not perfect? Remember how 2 °F-3 °F could mean the difference between a harmless cold rain and a catastrophic freezing rain event? It also turns out that the locations of strongest rising motion (again, something that can't be directly measured) will frequently determine who gets the most precipitation. Sometimes, you can have some situations where one town gets 10" of snow while a town about 10 miles away gets a light dusting. To further complicate matters, the temperature and moisture levels will tell you something about the precipitation you're likely to witness. Having temperatures closer to freezing at the ground might mean you get heavy wet snowflakes that are more capable of damaging trees, houses, and power lines. Having very cold temperatures mean drier and powdery snow, which is easier for strong winds to blow around and produce blizzard conditions. Also, on more a technical note, "cold precipitation production" is one of the most complicated microphysical processes that occurs in the atmosphere, and it's impossible for models to definitively lock down.

        b. The future is just as important as the present. Often times, the severity of a winter weather is accentuated by what happens after all of the precipitation has fallen. If the temperature warms to be above freezing, then all of the ice and snow has a chance to melt, perhaps helping to clear the roads. If the temperature largely stays below freezing, than whatever accumulates will stay there. This may not seem like a big deal to Northern states that receive large amounts of snow on a regular basis, but it is a huge deal to Southern states where winter weather resources are a lot more limited. A high profile case for this was the "Cobblestone Ice Storm" that hit Fort Worth and Denton in December 2013. Many areas received over half an inch of freezing rain and three inches of sleet accumulation. As people drove around on the sleet and marginal ground temperatures caused some partial melting, very lumpy surfaces of ice would develop on the roads. At night, the temperature would fall well below freezing, causing the lumpy slush to refreeze and fuse to the road surface, making it impossible to drive on and impossible to remove. The ice persisted for almost 7 days, effectively shutting down the metroplex for that length of time. If the temperature had risen to, say, 50 °F after all the precipitation had fallen, the ice would have melted completely and things would have returned to normal. On a side note, after a heavy precipitation event, a rapid warm up can be just as devastating, because it can result in major flooding, which is what happened in Nebraska back in February 2019. Thing is, if you want to know when the temperature gets above freezing, you may have to entrust model outputs that will become increasingly inaccurate as you look deeper into the future.

        c. Exact precipitation amounts can be extremely difficult to accurately predict. This was alluded to in an earlier paragraph, but it's worth reiterating: one of the most complicated microphysical processes in the atmosphere involves the formation and behavior of solid precipitation (ice crystals and snow). Water content and temperatures will determine how small or large the snowflakes will be, and the efficiency of precipitation production depends heavily on where the "dendritic growth layer" is. Put simply, the dendritic growth layer is where in the atmosphere the temperature is between about -10 °C and -20 °C. If this layer is located in the middle troposphere where friction is almost non-existent (usually where the pressure is around 500mb), precipitation production will be very efficient. Otherwise, precipitation production may be less efficient. As if that weren't enough, remember that we're talking about a microphysical process that involves objects that are smaller than 1 mm in diameter. Most models have a grid resolution that 1000 times larger, meaning they can't resolve the fine details of these ice crystals that will be become snow. Rough estimates have to be made, which can naturally result in significant errors, even for a very short-term forecast.

III. Heavy Rain Events

        a. Rainfall amounts are difficult for models to resolve. Many forecast models include an algorithm that estimates how much rainfall is expect over a given time at various different points along its grid. Rainfall totals are sensitive to many factors that are not well resolved. One of which is the fact that raindrops are far too small for current models to properly handle. Another crucial factor that is difficult for models to resolve is the exact behavior of storms, especially those triggered by small-scale atmospheric features like outflow boundaries (mentioned under severe weather).

        b. A large-scale pattern that favors a heavy rain event is very similar to patterns that favor squall lines, which usually don't produce as much rain. Remember how we talked about storm mode (linear vs. isolated)? Well, if you have strong winds running parallel to a front, you're going to get linear storm modes. For a flash flooding event, individual storms within the line would have to track over the same area. To help understand this, imagine a long line of 5 storms all lined up to move over the same area. If each storm drops one inch of rain, having 5 storms move over the same area would result in 5 inches of rain, which is likely to produce major flooding problems. This is often referred to as "training thunderstorms" or (the more technical term) "training convection". Now, here's where things get interesting: imagine you have a line of storms where all of the storms are moving from south to north, effectively tracking over the same area. Suddenly, a squall begins to form, causing the individual storms to track from west to east. You no longer have thunderstorms moving over the same areas. In fact, now you've got a damaging wind threat to worry about. I've personally worked many flash flooding events, and one of the most common failure modes I've encountered has to deal with squall lines developing, which lessen the flash flooding risk.

        c. The threat for flash flooding depends on topography, soils, and vegetation. Ever wonder why flooding events are relatively uncommon in the Southeast even though it rains a lot more frequently? The reason why largely has to do with the landscape and vegetation. Most soils in the Southeast can absorb a lot of water before they become saturated, meaning runoff will be very slow to form. Also, some of the most densely forested areas can be found in the Southeast. More plantation means more water will be absorbed, leading to less runoff. Contrast this to what can be found in, say, the Plains where the soil is composed of red clay (which is very poor at absorbing water) and much more limited vegetation. Or, contrast this to Colorado which has lots of mountains, which tend to gather large amounts of water into a common area (hence why 1" of rain can cause major flooding in the mountains). Bottom line, if you're forecasting a flash flooding event, you have to take into account more than just the weather pattern; you have to know something about the current landscape. Problem is, observations on soil moisture and plantation have very poor coverage and are not made very frequently.

IV. Fire Weather Events

        a. Weather conditions that favor rapid wildfire spread vary significantly from region to region. One of the research projects I did while in college was with the Storm Prediction Center's fire weather operations. We were trying to determine when conditions were favorable for rapid wildfire spread in the Southeast. As an example, one value that's often used to predict fire weather is relative humidity. In the Desert Southwest, usually relative humidity of 10% or lower is favorable for rapidly spreading fires. Some wildfires in the Southeast occurred when the relative humidity was as high as 70%, meaning these fires were occurring despite a significant amount of moisture being present in the air.

        b. Health and type of vegetation is a crucial component. Just like with forecasting flash flooding, the conditions of regional vegetation as well as its behavior are important for diagnosing when rapid wildfire spread is favored. If the vegetation is dead and dried out, it will burn more readily than vegetation that is still alive and receiving healthy amounts of water. Another factor about vegetation that is just as important is how it behaves when subjected to heavy rain. "Fuel loading" is the term that describes plants and trees growing when there is plentiful rainfall. Subsequent drying means all of this vegetation can now be fuel for a potential wildfire. If the wet season is not wet, plant growth is stunted, and there will be less fuel for wildfires when conditions dry out. Knowing how regional plants behave when subjected to certain stresses (including heavy rain events, drought, extreme heat, and extreme cold) is important when determining what the wildfire threat will be ahead of time.

        c. Verifying "critical fire weather criteria" requires that fires occur. Suppose you're expecting sustained winds of 50 mph, relative humidities of around 5%, and very dry fuels. This would be a setup that would favor extremely rapid wildfire spread, if a fire were to occur. What if nothing lights a spark and no fires form? Was the forecast inaccurate? Technically, no. Usually, a forecast for "critical weather criteria" is verified by determining if the criteria were met; not if any wildfires actually occurred. That said, this does raise a hard-to-answer question if no fires occur: Were those conditions really favorable for rapid wildfire spread? Without a fire occurring, there is no way to know for sure.

Part Four: Conclusions

A forecast philosophy I've preached to many of my students goes as follows: "Forecasting the weather is not about trying to get things right; it's about trying to minimize how wrong you are." There is no such thing as a perfect forecast. Some are more accurate than others, but even the most accurate forecasts have their flaws. It's up to the forecaster to use their meteorological background and the tools at hand to come up with a "best forecast" or a "forecast that gets as few things wrong as possible".

I suppose it's somewhat of a minor miracle that meteorologists are able to get anything right. After all, none of the data they have is perfect, so they're basically destined to get things wrong before they even get a chance to look at the data.

Fortunately, the accuracy of weather forecasts has increased significantly through the years. As we unlock more effective modeling techniques and more powerful computers, the accuracy should continue to improve. If you're a member of the general public reading this, hopefully you have a better idea on why forecasting the weather is so challenging, and I hope you'll be more understanding when a forecast goes wrong.