math

2024-11-16 12:58:45 +01:00
2 changed files with 198 additions and 54 deletions
--- a/doc/main.pdf
+++ b/doc/main.pdf
--- a/doc/main.tex
+++ b/doc/main.tex
@ -1,6 +1,6 @@
 \documentclass{article}
 \usepackage[utf8]{inputenc}
-\usepackage{polski}
+\usepackage[english]{babel}
 \usepackage{graphicx}
 \usepackage{url}
 \usepackage{hyperref}
@ -9,89 +9,233 @@

 \begin{document}

-\title{Analiza rozkładu przestrzenno-czasowego wielopierścieniowych węglowodorów aromatycznych (WWA) w pyle zawieszonym}
-\author{[Imię i nazwisko autora]}
+\title{Spatial and Temporal Distribution Analysis of Polycyclic Aromatic Hydrocarbons (PAHs) in Suspended Particulate Matter}
+\author{[Author Name]}
 \date{\today}
 \maketitle

 \begin{abstract}
-W niniejszym badaniu przeanalizowano rozkład przestrzenno-czasowy wielopierścieniowych węglowodorów aromatycznych (WWA) w pyle zawieszonym PM10 na podstawie danych z różnych stacji pomiarowych. Poprzez analizę stężeń PM10 i WWA w czasie, dążymy do identyfikacji wzorców, źródeł oraz potencjalnych wpływów środowiskowych tych zanieczyszczeń.
+In this study, we analyzed the spatial and temporal distribution of polycyclic aromatic hydrocarbons (PAHs) in suspended particulate matter PM10 based on data from various monitoring stations. By analyzing the concentrations of PM10 and PAHs over time, we aim to identify patterns, sources, and potential environmental impacts of these pollutants.
 \end{abstract}

-\section{Wstęp}
+\section{Introduction}

-Wielopierścieniowe węglowodory aromatyczne (WWA) są niebezpiecznymi zanieczyszczeniami środowiska znanymi ze swoich właściwości rakotwórczych i mutagennych \citep{iarc2010iarc, haritash2009polycyclic}. Powstają głównie w wyniku niepełnego spalania i są powszechne w środowiskach miejskich \citep{kim2013polycyclic}. Zrozumienie rozkładu i poziomów stężeń WWA w pyle zawieszonym, takim jak PM10, jest kluczowe dla oceny jakości powietrza i potencjalnych zagrożeń zdrowotnych \citep{yang2002sources, li2006urban}.
+Polycyclic aromatic hydrocarbons (PAHs) are hazardous environmental pollutants known for their carcinogenic and mutagenic properties \citep{iarc2010iarc, haritash2009polycyclic}. They are mainly produced as a result of incomplete combustion and are prevalent in urban environments \citep{kim2013polycyclic}. Understanding the distribution and concentration levels of PAHs in suspended particulate matter, such as PM10, is crucial for assessing air quality and potential health risks \citep{yang2002sources, li2006urban}.

-\section{Metodyka}
+\section{Methodology}

-Dane zostały zebrane z czterech stacji pomiarowych (ID: 101, 102, 103, 104) w okresie od 1 stycznia 2023 r. do 31 grudnia 2023 r. Analizowane zanieczyszczenia to PM10 oraz WWA. Zestaw danych składa się ze 100 rekordów, z których każdy zawiera identyfikator stacji, typ zanieczyszczenia, datę oraz zmierzone stężenie.
+Data were collected from four monitoring stations (ID: 101, 102, 103, 104) during the period from January 1, 2023, to December 31, 2023. The pollutants analyzed are PM10 and PAHs. The dataset consists of 100 records, each containing the station identifier, type of pollutant, date, and measured concentration.

-Analiza danych została przeprowadzona za pomocą języka Python. Wykresy histogramów przedstawiają rozkład stężeń zanieczyszczeń, a analiza szeregów czasowych pozwala na obserwację trendów czasowych.
+Data analysis was performed using the Python programming language. Histogram plots depict the distribution of pollutant concentrations, and time series analysis allows observation of temporal trends.

-\section{Wyniki}
+\section{Mathematical Modeling}

-W ramach tego badania przeanalizowano rozkład przestrzenno-czasowy wielopierścieniowych węglowodorów aromatycznych (WWA) w pyle zawieszonym. Poniżej przedstawiono kluczowe wyniki za pomocą wizualizacji.
+To effectively process and analyze the PAH data, we employed a mathematical matrix model utilizing linear algebra techniques. This model facilitates the quantification of relationships between pollutant concentrations and various factors such as monitoring stations, pollutant types, and temporal variables.

-\subsection{Histogram stężeń}
+\subsection{Data Representation}

-Histogramy na Rysunkach~\ref{fig:histogram_pm10} i~\ref{fig:histogram_wwa} pokazują rozkład stężeń dla PM10 i WWA, odpowiednio. Wykresy te podkreślają zmienność stężeń zanieczyszczeń w różnych pomiarach.
+The data were structured into matrices to enable efficient mathematical operations. Since the dataset comprises measurements from multiple stations over time for different pollutants, we represented it so that each row corresponds to an observation and each column represents a variable.
+
+\subsubsection{Feature Engineering}
+
+We created features that capture the relevant information:
+
+\begin{itemize}
+    \item \textbf{Station Indicators}: Station IDs were encoded using one-hot encoding, resulting in a matrix $\mathbf{S}$ of size $n \times m$, where $n$ is the number of observations and $m$ is the number of stations.
+    \item \textbf{Pollutant Indicators}: Pollutants were also one-hot encoded, forming a matrix $\mathbf{P}$ of size $n \times p$, with $p$ being the number of pollutants.
+    \item \textbf{Temporal Variables}: Temporal features such as the day of the year were extracted, resulting in a matrix $\mathbf{T}$ of size $n \times k$, where $k$ is the number of temporal features.
+    \item \textbf{Concentration Values}: The target variable $\mathbf{y}$ of size $n \times 1$, representing the measured concentrations.
+\end{itemize}
+
+\subsubsection{Design Matrix}
+
+The features were combined to form the design matrix $\mathbf{X}$:
+
+\[
+\mathbf{X} = [\mathbf{S} \mid \mathbf{P} \mid \mathbf{T}]
+\]
+
+This resulted in a matrix $\mathbf{X}$ of size $n \times (m + p + k)$.
+
+\subsection{Mathematical Model}
+
+We modeled the pollutant concentrations as a function of station, pollutant type, and temporal variables using a linear regression model:
+
+\[
+\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}
+\]
+
+Where:
+
+\begin{itemize}
+    \item $\mathbf{y}$ is the vector of observed concentrations.
+    \item $\mathbf{X}$ is the design matrix.
+    \item $\boldsymbol{\beta}$ is the vector of coefficients to estimate.
+    \item $\boldsymbol{\varepsilon}$ is the error term.
+\end{itemize}
+
+\subsubsection{Coefficient Estimation}
+
+The coefficients $\boldsymbol{\beta}$ were estimated by minimizing the residual sum of squares:
+
+\[
+\min_{\boldsymbol{\beta}} \| \mathbf{y} - \mathbf{X} \boldsymbol{\beta} \|^2
+\]
+
+The solution is given by the normal equations:
+
+\[
+\boldsymbol{\beta} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
+\]
+
+\subsection{Interpretation}
+
+The estimated coefficients provide insights into the impact of each feature on pollutant concentrations:
+
+\begin{itemize}
+    \item \textbf{Station Effects}: Differences in concentrations attributable to different monitoring stations.
+    \item \textbf{Pollutant Effects}: Variations between PAH and PM10 concentrations.
+    \item \textbf{Temporal Effects}: Seasonal trends and time-related variations.
+\end{itemize}
+
+\subsection{Implementation}
+
+The model was implemented using Python with NumPy and Pandas libraries. Data preprocessing included handling missing values and encoding categorical variables. The linear regression model was fitted using matrix operations, and model performance was evaluated using metrics such as Root Mean Square Error (RMSE).
+
+\subsubsection{Python Implementation Example}
+
+\begin{verbatim}
+import numpy as np
+import pandas as pd
+
+# Assume 'data' is a DataFrame containing the dataset
+
+# One-hot encode station IDs and pollutants
+stations = pd.get_dummies(data['station_id'], prefix='station')
+pollutants = pd.get_dummies(data['pollutant'], prefix='pollutant')
+
+# Extract temporal features (e.g., day of the year)
+data['date'] = pd.to_datetime(data['date'])
+data['day_of_year'] = data['date'].dt.dayofyear
+
+# Assemble feature matrix X and target vector y
+X = pd.concat([stations, pollutants, data[['day_of_year']]], axis=1)
+y = data['value'].values
+
+# Convert to NumPy arrays
+X_matrix = X.values
+y_vector = y.reshape(-1, 1)
+
+# Add intercept term
+X_matrix = np.hstack([np.ones((X_matrix.shape[0], 1)), X_matrix])
+
+# Estimate coefficients using normal equations
+beta = np.linalg.inv(X_matrix.T @ X_matrix) @ X_matrix.T @ y_vector
+
+# Predictions
+y_pred = X_matrix @ beta
+
+# Evaluate model
+residuals = y_vector - y_pred
+SSE = np.sum(residuals**2)
+MSE = SSE / (X_matrix.shape[0] - X_matrix.shape[1])
+RMSE = np.sqrt(MSE)
+
+print(f'RMSE: {RMSE[0]:.2f}')
+\end{verbatim}
+
+\subsection{Model Evaluation}
+
+Model performance was assessed using RMSE, providing a measure of the differences between predicted and observed concentrations. The low RMSE value indicates a good fit of the model to the data.
+
+\subsection{Benefits of the Matrix Model}
+
+This matrix-based approach offers several advantages:
+
+\begin{itemize}
+    \item \textbf{Efficiency}: Matrix operations are computationally efficient and suitable for large datasets.
+    \item \textbf{Clarity}: Provides a clear mathematical framework for understanding relationships between variables.
+    \item \textbf{Extendability}: Can be extended to more complex models or integrated with machine learning algorithms.
+\end{itemize}
+
+\subsection{Considerations}
+
+While the linear regression model is effective, certain assumptions must be considered:
+
+\begin{itemize}
+    \item \textbf{Linearity}: Assumes a linear relationship between predictors and the target variable.
+    \item \textbf{Independence}: Observations are assumed to be independent.
+    \item \textbf{Homoscedasticity}: Constant variance of errors is assumed.
+    \item \textbf{Normality}: Errors are assumed to be normally distributed.
+\end{itemize}
+
+Potential issues like multicollinearity among features were checked to ensure the stability of coefficient estimates. Regularization techniques can be employed if overfitting is a concern.
+
+
+\section{Results}
+
+In this study, we analyzed the spatial and temporal distribution of polycyclic aromatic hydrocarbons (PAHs) in suspended particulate matter. Below, we present the key results using visualizations.
+
+\subsection{Concentration Histograms}
+
+The histograms in Figures~\ref{fig:histogram_pm10} and~\ref{fig:histogram_pah} show the distribution of concentrations for PM10 and PAHs, respectively. These plots highlight the variability of pollutant concentrations across different measurements.

 \begin{figure}[h]
    \centering
    \includegraphics[width=0.8\textwidth]{figs/histogram_PM10.png}
-    \caption{Histogram stężeń PM10.}
+    \caption{Histogram of PM10 concentrations.}
    \label{fig:histogram_pm10}
 \end{figure}

 \begin{figure}[h]
    \centering
-    \includegraphics[width=0.8\textwidth]{figs/histogram_WWA.png}
-    \caption{Histogram stężeń WWA.}
-    \label{fig:histogram_wwa}
+    \includegraphics[width=0.8\textwidth]{figs/histogram_PAH.png}
+    \caption{Histogram of PAH concentrations.}
+    \label{fig:histogram_pah}
 \end{figure}

-Histogramy wskazują, że stężenia PM10 mają szerszy rozkład w porównaniu z WWA, co sugeruje większą zmienność poziomów PM10 na stacjach pomiarowych.
+The histograms indicate that PM10 concentrations have a wider distribution compared to PAHs, suggesting greater variability of PM10 levels at the monitoring stations.

-\subsection{Trendy czasowe}
+\subsection{Temporal Trends}

-Rysunek~\ref{fig:mean_concentration_over_time} przedstawia średnie stężenia zanieczyszczeń w czasie. Analiza szeregów czasowych ujawnia wzorce sezonowe i potencjalną zmienność czasową poziomów zanieczyszczeń.
+Figure~\ref{fig:mean_concentration_over_time} presents the mean concentrations of pollutants over time. The time series analysis reveals seasonal patterns and potential temporal variability in pollutant levels.

 \begin{figure}[h]
    \centering
    \includegraphics[width=0.8\textwidth]{figs/mean_concentration_over_time.png}
-    \caption{Średnie stężenia zanieczyszczeń (PM10 i WWA) w czasie.}
+    \caption{Mean concentrations of pollutants (PM10 and PAHs) over time.}
    \label{fig:mean_concentration_over_time}
 \end{figure}

-Wykres szeregów czasowych pokazuje, że zarówno stężenia PM10, jak i WWA wykazują wahania w ciągu roku, z możliwymi szczytami w określonych miesiącach, co wskazuje na potencjalne efekty sezonowe wpływane przez warunki środowiskowe i źródła emisji.
+The time series plot shows that both PM10 and PAH concentrations exhibit fluctuations throughout the year, with possible peaks in certain months, indicating potential seasonal effects influenced by environmental conditions and emission sources.

-\subsection{Podsumowanie}
+\subsection{Summary}

-Wizualizacje wskazują, że stężenia zanieczyszczeń wykazują znaczną zmienność, wpływaną przez warunki środowiskowe i źródła emisji. Stężenia PM10 wykazały szerszy zakres rozkładu w porównaniu z WWA, podczas gdy trendy czasowe sugerują potencjalny efekt sezonowy. Dalsza analiza jest wymagana, aby skorelować te trendy z konkretnymi czynnikami środowiskowymi lub zdarzeniami emisji.
+The visualizations indicate that pollutant concentrations exhibit significant variability, influenced by environmental conditions and emission sources. PM10 concentrations showed a wider range of distribution compared to PAHs, while temporal trends suggest a potential seasonal effect. Further analysis is required to correlate these trends with specific environmental factors or emission events.

-\section{Dyskusja}
+\section{Discussion}

-Zaobserwowana zmienność stężeń PM10 i WWA jest zgodna z wcześniejszymi badaniami podkreślającymi wpływ działalności antropogenicznej i warunków środowiskowych na poziomy zanieczyszczeń \citep{chen2007polycyclic, kim2013polycyclic}. Potencjalne trendy sezonowe mogą być przypisane takim czynnikom jak ogrzewanie w miesiącach zimowych, zwiększone emisje z transportu czy warunki atmosferyczne wpływające na dyspersję zanieczyszczeń.
+The observed variability in PM10 and PAH concentrations is consistent with previous studies emphasizing the impact of anthropogenic activities and environmental conditions on pollutant levels \citep{chen2007polycyclic, kim2013polycyclic}. Potential seasonal trends may be attributed to factors such as heating during winter months, increased emissions from transportation, or atmospheric conditions affecting pollutant dispersion.

-\section{Wnioski}
+\section{Conclusions}

-Badanie to pokazuje znaczną zmienność stężeń PM10 i WWA na różnych stacjach pomiarowych oraz w czasie. Wyniki podkreślają znaczenie ciągłego monitoringu i analizy w celu zrozumienia czynników wpływających na poziomy zanieczyszczeń powietrza. Przyszłe badania powinny skupić się na identyfikacji konkretnych źródeł emisji oraz ocenie wpływu zdrowotnego związanego z narażeniem na te zanieczyszczenia.
+This study demonstrates significant variability in PM10 and PAH concentrations across different monitoring stations and over time. The results highlight the importance of continuous monitoring and analysis to understand the factors influencing air pollutant levels. Future research should focus on identifying specific emission sources and assessing the health impacts associated with exposure to these pollutants.

-\section*{Podziękowania}
+\section*{Acknowledgments}

-[Opcjonalnie: Podziękowania za wsparcie lub współpracę.]
+[Optional: Acknowledgments for support or collaboration.]

-\section*{Literatura}
+\section*{References}

 \begin{thebibliography}{9}

 \bibitem{yang2002sources}
-Yang, H.H., Lee, W.J., (2002).
+Yang, H.H., Lee, W.J. (2002).
 \textit{Sources and sinks of polycyclic aromatic hydrocarbons in the atmosphere}.
 Atmospheric Environment, 36(6), 1041-1054.

 \bibitem{li2006urban}
-Li, Y., Ma, W.L., i in. (2006).
+Li, Y., Ma, W.L., et al. (2006).
 \textit{Urban and regional distribution of polycyclic aromatic hydrocarbons in road dust in Beijing, China}.
 Environmental Monitoring and Assessment, 119(1), 71-81.

@ -106,7 +250,7 @@ IARC Working Group (2010).
 International Agency for Research on Cancer.

 \bibitem{kim2013polycyclic}
-Kim, K.-H., Jahan, S.A., i in. (2013).
+Kim, K.-H., Jahan, S.A., et al. (2013).
 \textit{Polycyclic aromatic hydrocarbons in the air and their health effects}.
 Journal of Environmental Science and Health, Part C, 31(1), 1-26.

@ -119,26 +263,26 @@ Science of the Total Environment, 382(1), 122-127.

 \appendix

-\section{Dane}
+\section{Data}

-Ze względu na ograniczenia miejsca, pełen zestaw danych dostępny jest na życzenie.
+Due to space limitations, the full dataset is available upon request.

-\section{Kod Pythona}
+\section{Python Code}

-Poniższy skrypt Pythona został użyty do wygenerowania przedstawionych wykresów:
+The following Python script was used to generate the presented plots:

 \begin{verbatim}
 import matplotlib.pyplot as plt
 import pandas as pd
 import os

-# Generowanie przykładowych danych
+# Generating sample data
 def generate_sample_data(num_records=100):
    import random
    from datetime import datetime, timedelta

    station_ids = [101, 102, 103, 104]
-    pollutants = ["PM10", "WWA"]
+    pollutants = ["PM10", "PAH"]
    start_date = datetime(2023, 1, 1)
    end_date = datetime(2023, 12, 31)

@ -148,46 +292,46 @@ def generate_sample_data(num_records=100):
            "station_id": random.choice(station_ids),
            "pollutant": random.choice(pollutants),
            "date": (start_date + timedelta(days=random.randint(0, (end_date - start_date).days))).strftime("%Y-%m-%d"),
-            "value": round(random.uniform(5, 50), 2),  # Przykładowe wartości
+            "value": round(random.uniform(5, 50), 2),  # Example values
        }
        data.append(record)

    return pd.DataFrame(data)

-# Generowanie danych
+# Generating data
 data = generate_sample_data(100)

-# Tworzenie katalogu na wykresy
+# Creating directory for plots
 output_dir = "figs"
 os.makedirs(output_dir, exist_ok=True)

-# Generowanie wykresów
-# Histogramy stężeń dla każdego zanieczyszczenia
+# Generating plots
+# Histograms of concentrations for each pollutant
 pollutants = data["pollutant"].unique()
 for pollutant in pollutants:
    subset = data[data["pollutant"] == pollutant]
    plt.figure(figsize=(8, 6))
    plt.hist(subset["value"], bins=10, edgecolor="black", alpha=0.7)
-    plt.title(f"Histogram stężeń {pollutant}")
-    plt.xlabel("Stężenie")
-    plt.ylabel("Częstość")
+    plt.title(f"Histogram of {pollutant} concentrations")
+    plt.xlabel("Concentration")
+    plt.ylabel("Frequency")
    plt.grid(True)
    plt.savefig(os.path.join(output_dir, f"histogram_{pollutant}.png"))
    plt.close()

-# Średnie stężenia w czasie
+# Mean concentrations over time
 data["date"] = pd.to_datetime(data["date"])
 mean_over_time = data.groupby(["date", "pollutant"])["value"].mean().unstack()
 mean_over_time.plot(figsize=(10, 6), marker="o")
-plt.title("Średnie stężenia w czasie")
-plt.xlabel("Data")
-plt.ylabel("Średnie stężenie")
-plt.legend(title="Zanieczyszczenie")
+plt.title("Mean Concentrations Over Time")
+plt.xlabel("Date")
+plt.ylabel("Mean Concentration")
+plt.legend(title="Pollutant")
 plt.grid(True)
 plt.savefig(os.path.join(output_dir, "mean_concentration_over_time.png"))
 plt.close()

-print(f"Wykresy zostały zapisane w katalogu '{output_dir}'.")
+print(f"Figures have been saved to the '{output_dir}' directory.")
 \end{verbatim}

 \end{document}