Temperatures & Visualization

What’s the best way to guess the temperature at the top of a mountain?

Climate.

Intro and Foundations

Today we are going to examine multiple dataframes using sqlite3. These dataframes include NOAA station metadata, FIPS_10-4 & ISO 3166 ID codes for countries, and NOAA station temperature data.

☀1

First off, in order to execute our plan, we need to import our initial models.

import pandas as pd
import sqlite3 
from plotly import express as px
from plotly.io import write_html
import numpy as np
from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
from plotly.io import write_html

Now that our modules are imported, lets import our csvs, and use SQL to combine them into a SQL database!

url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/noaa-ghcn/station-metadata.csv"

stations = pd.read_csv(url)
temps = pd.read_csv("temps_stacked.csv")

countries = pd.read_csv('countries.csv')
# sql doesn't like spaces, so we edit the column names to get rid of it 
countries = countries.rename(columns= {"FIPS 10-4": "FIPS_10-4"})

We can take a look at list(countries.columns) + list(temps.columns) + list(stations.columns) to see the available columns.

[‘FIPS_10-4’, ‘ISO 3166’, ‘Name’, ‘ID’, ‘Year’, ‘Month’, ‘Temp’, ‘ID’, ‘LATITUDE’, ‘LONGITUDE’, ‘STNELEV’, ‘NAME’]

Now we can create our database using sql.

#open a connection to temps.db so that you can 'talk' to it using python
conn = sqlite3.connect("temps.db")

temps.to_sql("temperatures", conn, if_exists="replace", index=False)
countries.to_sql("countries", conn, if_exists="replace", index=False)
stations.to_sql("stations", conn, if_exists="replace", index=False)

# always close your connection
conn.close()

☀2

Now that we have created our database, lets try creating a query function called “query_climate_database()” that will obtain temperatures for a range of years on a specific month, for any country selected.

def query_climate_database(country, year_begin, year_end, month):
    #obtain 2 letter country ID
    countree = countries[countries['Name']== str(country)].iloc[0]['FIPS_10-4']
    
    #connect to the database
    conn = sqlite3.connect("temps.db")
    
    cmd_text_1 = f'''
    SELECT S.name, S.latitude, S.longitude, SUBSTRING(S.id, 1, 2) Country, T.year, T.month, T.temp 
    FROM temperatures T 
    LEFT JOIN stations S ON T.id = S.id 
    WHERE (SUBSTRING(S.id,1,2) = '{countree}') AND (T.year BETWEEN {year_begin} AND {year_end}) AND (T.month == {month})  
    '''

    cmd = cmd_text_1 
 
    #write the cmd into a df
    df = pd.read_sql_query(cmd, conn)
    
    conn.close() #close our connection
    
    return df #output our df

Lets Implement our Function!

☀3

Using our function, now lets take this function and apply some graphical functions on top of it! Lets create a function called temperature_coefficient_plot that will create a scatter plot on top of a global map image that shows the average yearly temperature change for a selected month in a selected country.

#this function will use LinearRegression to find the coefficient of Year x Temp and return the value
def coef(dfs):
    X = dfs[["Year"]]
    y = dfs["Temp"]
    LR = LinearRegression()
    LR.fit(X, y)
    slope = LR.coef_[0]
    return slope

#this function takes our df, converts numbers and adds required columns to our final df
def change_in_temp_creator(df):
    
    #make the column; groupbyname -> isolate "Temp" colum and find length... add length to
    #the column
    df['obs'] = df.groupby(['NAME'])["Temp"].transform(len)       

    #rounding out the long/lat
    df[["LATITUDE", "LONGITUDE"]] = df[["LATITUDE", "LONGITUDE"]].round(2)

    #mask the 'obs' and get rid of all rows w/ obs < 10
    df = df[df['obs'] >= 10]
    
    #apply our coef function with the selected dataframe
    coefs = df.groupby(["NAME", "Month","LATITUDE", "LONGITUDE"]).apply(coef)
    coefs = coefs.reset_index() #resets our index
    
    #change column name to be more descriptive
    coefs.rename(columns = {0: "Est. yearly increase Celcius"}, inplace = True)
    coefs["Est. yearly increase Celcius"] = coefs["Est. yearly increase Celcius"].round(4) #round our values
    
    return (coefs) #df

#this is our core function to output our eventual plot 
def temperature_coefficient_plot(country, year_begin, year_end, month, min_obs, **kwargs):
    #init df
    df = query_climate_database(country, year_begin, year_end, month)
    
    df1 = change_in_temp_creator(df)
    
    month_name = months[str(month)]
    
    fig = px.scatter_mapbox(df1,
                        lat = "LATITUDE", # column name for latitude informataion
                        lon = "LONGITUDE", # column name for longitude information
                        opacity = 1,
                        hover_name = "NAME",
                        hover_data= ["Est. yearly increase Celcius"],
                        color = "Est. yearly increase Celcius",
                        range_color=[-0.1,0.1],
                        **kwargs)
                        
    
    fig.update_layout(
        title={
            'text': f"Est. in yearly temp. increase in {month_name} for stations in {country}, years {year_begin} - {year_end}",
            'y':0.96,
            'x':0.52,
            'xanchor': 'center',
            'yanchor': 'top'})

    
    return fig

color_map = px.colors.sequential.thermal #set a color map

fig = temperature_coefficient_plot("India", 1980, 2020, 1, 
                                   min_obs = 10,
                                   zoom = 3,
                                   mapbox_style="carto-positron",
                                   color_continuous_scale = color_map)

fig.show()

From the plot, we can see the distribution of the stations in India, in addition to their increase coefficient on the map. Additionally, we can input new parameters and reuse our function!

color_map = px.colors.sequential.thermal # choose a colormap

fig = temperature_coefficient_plot("China", 1975, 2020, 12, 
                                   min_obs = 20,
                                   zoom = 2,
                                   mapbox_style="carto-positron",
                                   color_continuous_scale = color_map)

fig.show()

Lets explore our Database some more!

☀4a)

Lets say we want more detailed descriptive statistics on the temperatures collected from our global stations. Using a boxplot, we can easily obtain and visualize such data.

Lets create a function that will take in a specific country, a time range, and additional graphing parameters and that outputs a boxplot accordingly.

#this is an SQL query function that will output our desired df based on the inputs
def boxplot_stations(country, year_begin, year_end): 
    #obtain 2 letter country ID
    countree = countries[countries['Name']== str(country)].iloc[0]['FIPS_10-4']

    #connect to the database
    conn = sqlite3.connect("temps.db")
    
    cmd_text_1 = f'''
    SELECT S.name, SUBSTRING(S.id, 1, 2) Country, T.year, T.month, T.temp 
    FROM temperatures T 
    LEFT JOIN stations S ON T.id = S.id 
    WHERE (SUBSTRING(S.id,1,2) = '{countree}') AND (T.year BETWEEN {year_begin} AND {year_end})   
    '''
    
    cmd = cmd_text_1 
 
    #write the cmd into a df
    df = pd.read_sql_query(cmd, conn)
    
    conn.close() #close our connection
    
    return df #output our df

#contains the core code to plot the dataframe
def boxplot_creator(country, year_begin, year_end, **kwargs):
    #init df
    df = boxplot_stations(country, year_begin, year_end)
    
    fig = px.box(df,
                y = "Temp",
                animation_frame = "Year",
                notched = True,
                width = 700,
                height = 700,
                color = "Year",
                title = f"Boxplot of Temps in {country} over {year_begin} - {year_end}",
                **kwargs)
    
    return fig

color = px.colors.sequential.Plasma #creating our colormap

fig = boxplot_creator("Russia", 1980, 2000,
                     color_discrete_sequence = color)

fig.show()

Looking at our plot, we can now get an time series of the descriptive statistics in a specific Country.

☀4b)

Additionally, lets now create another function to look at the distribution of stations and their elevations of a specific country during a certain year.

Lets create a function that will take in a country, year, and additional graphing parameters and outputs the correct mapbox accordingly.

def obtain_stnelevs(country, year_begin): 
    countree = countries[countries['Name']== str(country)].iloc[0]['FIPS_10-4']
    
    #connect to the database
    conn = sqlite3.connect("temps.db")
    
    cmd_text_1 = f'''
    SELECT S.STNELEV as Elevation, S.name, S.latitude, S.longitude, SUBSTRING(S.id, 1, 2) Country, T.year 
    FROM temperatures T 
    LEFT JOIN stations S ON T.id = S.id 
    WHERE (SUBSTRING(S.id,1,2) = '{countree}') AND (T.year = {year_begin})
    '''

    cmd = cmd_text_1 
 
    #write the cmd into a df
    df = pd.read_sql_query(cmd, conn)
    
    conn.close() #close our connection
    
    return df #output our df

def stnelev_plot(country, year_begin, **kwargs):
    
    #init df
    df = obtain_stnelevs(country, year_begin)
    
    #clean up the df
    #because indexes where station name are the same contain the same
    #lat/long/elevation, we should clean all duplicates so we have unique stations
    df1 = df.drop_duplicates().reset_index().drop(["index"], axis = 1)
    
    fig = px.scatter_mapbox(df1,
                    lat = "LATITUDE", # column name for latitude informataion
                    lon = "LONGITUDE", # column name for longitude information
                    opacity = 0.5,
                    hover_name = "NAME",
                    hover_data= ["Elevation"],
                    color = "Elevation",
                    range_color = [0, 5000],
                    **kwargs)

    
    return fig

color_map = px.colors.diverging.Spectral

fig = stnelev_plot("United States", 2020, mapbox_style= 'carto-darkmatter',
                    color_continuous_scale = color_map, zoom = 3, )

fig.show()

Look at the diagram, we can now examine any country in the stations database at any available year to understand the distribution and elevations of the stations.

Thanks for reading! See you next time.

Written on April 15, 2022