<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Jien Weng</title>
    <subtitle>Research notes, essays, publications, and working papers by Jien Weng.</subtitle>
    <link rel="self" type="application/atom+xml" href="https://jienweng.github.io/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://jienweng.github.io"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-03-03T00:00:00+00:00</updated>
    <id>https://jienweng.github.io/atom.xml</id>
    <entry xml:lang="en">
        <title>Making Linear Algebra Make Sense with Gemini CLI</title>
        <published>2026-03-03T00:00:00+00:00</published>
        <updated>2026-03-03T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/linear-algebra-sense-making/"/>
        <id>https://jienweng.github.io/notes/linear-algebra-sense-making/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/linear-algebra-sense-making/">&lt;p&gt;This note shows a practical way to teach linear algebra to adult learners: start from geometric intuition, then move to notation only when needed. The goal is to make vectors, linear maps, and matrix multiplication operational rather than abstract. If you teach, self-study, or mentor beginners, this is a compact framework you can reuse.&lt;&#x2F;p&gt;
&lt;p&gt;Linear algebra is the engine behind AI and 3D graphics. Most adult learners find it dry and abstract. Textbooks often assume you have endless patience for matrix operations. Adults need things to actually make sense. They have real jobs and limited time. They cannot spend weeks wondering why they are multiplying rows by columns.&lt;&#x2F;p&gt;
&lt;p&gt;Last weekend I tried something different. I used Gemini CLI to help a learner who was stuck on the traditional formula-first path. The goal was simple. Find the sweet spot between math and common sense. We moved away from the rigid structure of a classroom. We treated the math like a puzzle to solve together.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-gemini-cli&quot;&gt;Why Gemini CLI&lt;&#x2F;h2&gt;
&lt;p&gt;Standard textbooks are a one-way street. You read and you struggle. If it does not click you are just stuck. Gemini CLI changes the game. It creates an interactive sense-making loop. It acts as a bridge between the abstract symbols and the human intuition.&lt;&#x2F;p&gt;
&lt;p&gt;When my learner hit a wall we did not just re-read definitions. We asked Gemini to explain concepts through things he already knew. We used the CLI to quickly pivot between explanations until one finally clicked. This speed is critical. In a normal setting a student might wait days for a tutor to find a better analogy. With the CLI we found five analogies in five minutes. We kept the momentum alive.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;balancing-math-and-meaning&quot;&gt;Balancing Math and Meaning&lt;&#x2F;h2&gt;
&lt;p&gt;The biggest hurdle is the gap between symbols and significance. Computing a dot product is easy. Explaining why it matters is the hard part. Many people can do the math but few can see the picture.&lt;&#x2F;p&gt;
&lt;p&gt;Gemini worked as an amazing teaching assistant. We followed a specific flow.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Iterate on Intuition:&lt;&#x2F;strong&gt; Start with a simple analogy. Maybe we talk about how light hits a surface or how similar two songs are. Then we tighten it with math.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Context:&lt;&#x2F;strong&gt; Instantly see where these concepts live in the real world. We would ask for a Python snippet that used the concept we just discussed.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Bridge the Gap:&lt;&#x2F;strong&gt; Translate intuition back into formal notation. Once the learner understands the why the how becomes trivial.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The math is not the enemy. The lack of context is. By bringing the math into the CLI we made it tangible. We made it something you can poke and prod until it makes sense.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;it-actually-worked&quot;&gt;It Actually Worked&lt;&#x2F;h2&gt;
&lt;p&gt;The breakthrough happened during our last session. No more frustration. There was a clear aha moment. He was not just following a recipe. He understood the logic. He could see the vectors moving. He could feel the transformation happening in the space.&lt;&#x2F;p&gt;
&lt;p&gt;Using LLMs to build mental models is a total game-changer for adult education. It turns the struggle into a productive exploration. It respects the learners time and intelligence. It does not treat them like a child. It treats them like a collaborator.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;final-thought&quot;&gt;Final Thought&lt;&#x2F;h2&gt;
&lt;p&gt;Linear algebra does not have to be a slog through notation. Balance the rigour with intuition. Use tools like Gemini CLI to make complex ideas accessible. Break the walls between you and the understanding.&lt;&#x2F;p&gt;
&lt;p&gt;Give this AI-assisted intuition method a shot. It might be the bridge you need.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;get-the-sense-maker-s-playbook&quot;&gt;Get the Sense-Maker&#x27;s Playbook&lt;&#x2F;h2&gt;
&lt;p&gt;Stop struggling with dry textbooks. I have compiled the exact method and Gemini CLI prompts I used to help my learner achieve a breakthrough in one weekend.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The Linear Algebra Sense-Maker&#x27;s PDF includes:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;A roadmap for intuitive linear algebra.&lt;&#x2F;li&gt;
&lt;li&gt;Prompt templates for your own sense-making CLI.&lt;&#x2F;li&gt;
&lt;li&gt;The eigenvector and dot product case studies.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;checkya.com&#x2F;jienweng&#x2F;offers&#x2F;31145&quot;&gt;Support my work and download the PDF to start making sense of math today.&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Correlation</title>
        <published>2026-01-23T00:00:00+00:00</published>
        <updated>2026-01-23T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/regression-04-correlation/"/>
        <id>https://jienweng.github.io/notes/regression-04-correlation/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/regression-04-correlation/">&lt;p&gt;This note defines correlation clearly, explains how to compute it, and separates association from causation. The key issue is that correlation is often over-interpreted as evidence of mechanism. The goal is to make correlation a diagnostic tool, not a conclusion.&lt;&#x2F;p&gt;
&lt;p&gt;In the previous posts, we built the regression framework using $S_{xx}$ and $S_{xy}$ to derive the OLS estimators. Before we move to deeper inferential topics, it is essential to formalize a measure of the strength of the linear association between two variables. This measure is the Pearson correlation coefficient.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;recap-of-key-notation&quot;&gt;Recap of Key Notation&lt;&#x2F;h2&gt;
&lt;p&gt;In &lt;a href=&quot;&#x2F;posts&#x2F;regression-01-simple-linear-regression&quot;&gt;Simple Linear Regression&lt;&#x2F;a&gt;, we introduced:&lt;&#x2F;p&gt;
&lt;p&gt;$$S_{xx} = \sum^n_{i=1}(x_i - \bar{x})^2,$$&lt;&#x2F;p&gt;
&lt;p&gt;$$S_{xy} = \sum^n_{i=1}(x_i - \bar{x})(y_i - \bar{y}).$$&lt;&#x2F;p&gt;
&lt;p&gt;We now introduce the remaining quantity:&lt;&#x2F;p&gt;
&lt;p&gt;$$S_{yy} = \sum^n_{i=1}(y_i - \bar{y})^2.$$&lt;&#x2F;p&gt;
&lt;p&gt;These three summary statistics capture the variability of $x$, the variability of $y$, and their joint variability, respectively.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;sample-standard-deviations&quot;&gt;Sample Standard Deviations&lt;&#x2F;h2&gt;
&lt;p&gt;The sample standard deviations are defined as:&lt;&#x2F;p&gt;
&lt;p&gt;$$S_x = \sqrt{\frac{S_{xx}}{n - 1}} = \sqrt{\frac{1}{n-1}\sum^n_{i=1}(x_i - \bar{x})^2},$$&lt;&#x2F;p&gt;
&lt;p&gt;$$S_y = \sqrt{\frac{S_{yy}}{n - 1}} = \sqrt{\frac{1}{n-1}\sum^n_{i=1}(y_i - \bar{y})^2}.$$&lt;&#x2F;p&gt;
&lt;p&gt;Note that $S_x$ and $S_y$ are the standard deviations using the $n - 1$ denominator (Bessel&#x27;s correction), which gives unbiased estimates of the population standard deviations $\sigma_x$ and $\sigma_y$.&lt;&#x2F;p&gt;
&lt;p&gt;The sample variances are simply:&lt;&#x2F;p&gt;
&lt;p&gt;$$S_x^2 = \frac{S_{xx}}{n-1}, \quad S_y^2 = \frac{S_{yy}}{n-1}.$$&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-pearson-correlation-coefficient&quot;&gt;The Pearson Correlation Coefficient&lt;&#x2F;h2&gt;
&lt;p&gt;The sample Pearson correlation coefficient is defined as:&lt;&#x2F;p&gt;
&lt;p&gt;$$r = \frac{S_{xy}}{\sqrt{S_{xx} \cdot S_{yy}}}.$$&lt;&#x2F;p&gt;
&lt;p&gt;Equivalently, using the standard deviations:&lt;&#x2F;p&gt;
&lt;p&gt;$$r = \frac{\sum^n_{i=1}(x_i - \bar{x})(y_i - \bar{y})}{(n-1) S_x S_y} = \frac{S_{xy}}{(n-1) S_x S_y}.$$&lt;&#x2F;p&gt;
&lt;p&gt;The first form (using $S_{xx}$, $S_{yy}$, $S_{xy}$) is often more convenient for algebraic manipulation, while the second form highlights that $r$ is a standardized measure of covariation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;properties-of-the-correlation-coefficient&quot;&gt;Properties of the Correlation Coefficient&lt;&#x2F;h2&gt;
&lt;p&gt;The Pearson correlation coefficient has several important properties:&lt;&#x2F;p&gt;
&lt;h3 id=&quot;1-bounded-between-1-and-1&quot;&gt;1. Bounded between -1 and 1&lt;&#x2F;h3&gt;
&lt;p&gt;$$-1 \leq r \leq 1.$$&lt;&#x2F;p&gt;
&lt;p&gt;This follows from the Cauchy-Schwarz inequality, which states that $(S_{xy})^2 \leq S_{xx} \cdot S_{yy}$, with equality only when all data points lie exactly on a line.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;2-interpretation-of-values&quot;&gt;2. Interpretation of values&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;$r = 1$: perfect positive linear relationship (all points on a line with positive slope).&lt;&#x2F;li&gt;
&lt;li&gt;$r = -1$: perfect negative linear relationship (all points on a line with negative slope).&lt;&#x2F;li&gt;
&lt;li&gt;$r = 0$: no linear relationship (but there may be a nonlinear relationship).&lt;&#x2F;li&gt;
&lt;li&gt;$0 &amp;lt; |r| &amp;lt; 1$: partial linear association, with strength increasing as $|r|$ approaches 1.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;3-symmetry&quot;&gt;3. Symmetry&lt;&#x2F;h3&gt;
&lt;p&gt;$$r_{xy} = r_{yx}.$$&lt;&#x2F;p&gt;
&lt;p&gt;The correlation between $x$ and $y$ is the same as the correlation between $y$ and $x$. This follows because $S_{xy}$ is symmetric in $x$ and $y$.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;4-invariance-under-linear-transformation&quot;&gt;4. Invariance under linear transformation&lt;&#x2F;h3&gt;
&lt;p&gt;If we define $u_i = a + bx_i$ and $v_i = c + dy_i$ with $b &amp;gt; 0$ and $d &amp;gt; 0$, then the correlation between $u$ and $v$ equals the correlation between $x$ and $y$. If either $b$ or $d$ is negative (but not both), the sign of $r$ flips.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;5-dimensionless&quot;&gt;5. Dimensionless&lt;&#x2F;h3&gt;
&lt;p&gt;The correlation coefficient has no units. The numerator $S_{xy}$ has units of $x$ times $y$, and the denominator $\sqrt{S_{xx} \cdot S_{yy}}$ also has units of $x$ times $y$, so they cancel.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;relationship-to-the-regression-slope&quot;&gt;Relationship to the Regression Slope&lt;&#x2F;h2&gt;
&lt;p&gt;Recall from Post 1 that the OLS slope estimator is $\hat{\beta}_1 = S_{xy}&#x2F;S_{xx}$. The correlation coefficient can be expressed in terms of $\hat{\beta}_1$:&lt;&#x2F;p&gt;
&lt;p&gt;$$r = \hat{\beta}_1 \sqrt{\frac{S_{xx}}{S_{yy}}} = \hat{\beta}_1 \cdot \frac{S_x}{S_y}.$$&lt;&#x2F;p&gt;
&lt;p&gt;This shows that $r$ and $\hat{\beta}_1$ always share the same sign. A positive slope corresponds to a positive correlation, and a negative slope corresponds to a negative correlation.&lt;&#x2F;p&gt;
&lt;p&gt;Conversely, we can write the slope as:&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{\beta}_1 = r \cdot \frac{S_y}{S_x}.$$&lt;&#x2F;p&gt;
&lt;p&gt;This expresses the regression slope as the correlation times the ratio of the standard deviations.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;population-correlation&quot;&gt;Population Correlation&lt;&#x2F;h2&gt;
&lt;p&gt;The sample correlation $r$ estimates the population correlation coefficient $\rho$ (rho), defined for a bivariate population as:&lt;&#x2F;p&gt;
&lt;p&gt;$$\rho = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}.$$&lt;&#x2F;p&gt;
&lt;p&gt;When the data $(x_i, y_i)$ are sampled from a bivariate normal distribution, $r$ is a consistent estimator of $\rho$.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;correlation-does-not-imply-causation&quot;&gt;Correlation Does Not Imply Causation&lt;&#x2F;h2&gt;
&lt;p&gt;A strong correlation between two variables does not mean that one causes the other. The association might be due to a lurking variable, reverse causation, or coincidence. Regression models describe associations; establishing causation requires careful experimental design or additional assumptions.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;&#x2F;h2&gt;
&lt;p&gt;In this post, we introduced the Pearson correlation coefficient:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;$S_{yy} = \sum(y_i - \bar{y})^2$ completes the set of summary statistics alongside $S_{xx}$ and $S_{xy}$.&lt;&#x2F;li&gt;
&lt;li&gt;The sample standard deviations $S_x = \sqrt{S_{xx}&#x2F;(n-1)}$ and $S_y = \sqrt{S_{yy}&#x2F;(n-1)}$ measure spread.&lt;&#x2F;li&gt;
&lt;li&gt;The correlation $r = S_{xy}&#x2F;\sqrt{S_{xx} \cdot S_{yy}}$ quantifies the strength of the linear relationship.&lt;&#x2F;li&gt;
&lt;li&gt;$r$ is bounded between $-1$ and $1$, symmetric, and dimensionless.&lt;&#x2F;li&gt;
&lt;li&gt;The regression slope and correlation are related by $\hat{\beta}_1 = r \cdot S_y &#x2F; S_x$.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In the next post, we will define $R^2$ using SST, SSE, and SSR, and prove mathematically that $R^2 = r^2$ for simple linear regression.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Polynomial Regression</title>
        <published>2026-01-16T00:00:00+00:00</published>
        <updated>2026-01-16T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/regression-03-polynomial-regression/"/>
        <id>https://jienweng.github.io/notes/regression-03-polynomial-regression/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/regression-03-polynomial-regression/">&lt;p&gt;This note explains polynomial regression as a linear model in transformed features, not a different estimation framework. The motivation is to capture nonlinear trends while retaining familiar regression machinery. I highlight when polynomial terms help and when they mainly increase variance.&lt;&#x2F;p&gt;
&lt;p&gt;In the &lt;a href=&quot;&#x2F;posts&#x2F;regression-02-multiple-linear-regression&quot;&gt;previous post&lt;&#x2F;a&gt;, we introduced multiple linear regression using matrix notation. One natural question arises: what if the relationship between $x$ and $y$ is not a straight line? Polynomial regression addresses this by fitting a polynomial function of a single variable $x$, and it turns out to be a special case of the multiple linear regression framework we have already developed.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-polynomial-model&quot;&gt;The Polynomial Model&lt;&#x2F;h2&gt;
&lt;p&gt;A polynomial regression model of degree $d$ takes the form:&lt;&#x2F;p&gt;
&lt;p&gt;$$y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \cdots + \beta_d x_i^d + \varepsilon_i.$$&lt;&#x2F;p&gt;
&lt;p&gt;This model is nonlinear in the variable $x$, but it is still linear in the parameters $\beta_0, \beta_1, \ldots, \beta_d$. This distinction is important because it means we can use the same OLS estimation procedure from multiple linear regression.&lt;&#x2F;p&gt;
&lt;p&gt;For example, a quadratic model ($d = 2$) is:&lt;&#x2F;p&gt;
&lt;p&gt;$$y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \varepsilon_i,$$&lt;&#x2F;p&gt;
&lt;p&gt;and a cubic model ($d = 3$) is:&lt;&#x2F;p&gt;
&lt;p&gt;$$y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 + \varepsilon_i.$$&lt;&#x2F;p&gt;
&lt;h2 id=&quot;design-matrix-construction&quot;&gt;Design Matrix Construction&lt;&#x2F;h2&gt;
&lt;p&gt;To fit a polynomial regression using the matrix approach, we define new variables:&lt;&#x2F;p&gt;
&lt;p&gt;$$z_1 = x, \quad z_2 = x^2, \quad z_3 = x^3, \quad \ldots, \quad z_d = x^d.$$&lt;&#x2F;p&gt;
&lt;p&gt;The design matrix for a polynomial of degree $d$ with $n$ observations is:&lt;&#x2F;p&gt;
&lt;p&gt;$$\mathbf{X} = \begin{pmatrix} 1 &amp;amp; x_1 &amp;amp; x_1^2 &amp;amp; \cdots &amp;amp; x_1^d \\ 1 &amp;amp; x_2 &amp;amp; x_2^2 &amp;amp; \cdots &amp;amp; x_2^d \\ \vdots &amp;amp; \vdots &amp;amp; \vdots &amp;amp; \ddots &amp;amp; \vdots \\ 1 &amp;amp; x_n &amp;amp; x_n^2 &amp;amp; \cdots &amp;amp; x_n^d \end{pmatrix}.$$&lt;&#x2F;p&gt;
&lt;p&gt;This is exactly the design matrix for a multiple linear regression with $d$ predictors $z_1, z_2, \ldots, z_d$. Therefore, the OLS estimator is:&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y},$$&lt;&#x2F;p&gt;
&lt;p&gt;which is the same formula we derived in the previous post.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-polynomial-regression-is-a-special-case-of-mlr&quot;&gt;Why Polynomial Regression is a Special Case of MLR&lt;&#x2F;h2&gt;
&lt;p&gt;The key insight is that &quot;linear&quot; in &quot;linear regression&quot; refers to linearity in the parameters, not in the predictors. Although the polynomial model includes terms like $x^2$ and $x^3$, each coefficient $\beta_j$ appears linearly. We can treat each power of $x$ as a separate predictor variable, and the entire OLS theory from multiple linear regression applies directly.&lt;&#x2F;p&gt;
&lt;p&gt;This means all the results we derived earlier carry over:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The normal equations $\mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}^T\mathbf{Y}$ hold.&lt;&#x2F;li&gt;
&lt;li&gt;The hat matrix $\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T$ projects $\mathbf{Y}$ onto $\hat{\mathbf{Y}}$.&lt;&#x2F;li&gt;
&lt;li&gt;Residual properties remain the same.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;choosing-the-degree&quot;&gt;Choosing the Degree&lt;&#x2F;h2&gt;
&lt;p&gt;A natural question is: what degree $d$ should we use? The choice involves a tradeoff between model flexibility and model complexity.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Too low a degree&lt;&#x2F;strong&gt; (underfitting): The model is not flexible enough to capture the true relationship, leading to large systematic errors.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Too high a degree&lt;&#x2F;strong&gt; (overfitting): The model fits the training data very closely, including the noise, but performs poorly on new data.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;A polynomial of degree $n - 1$ (where $n$ is the number of data points) can pass through every data point exactly, giving $SSE = 0$. However, such a model almost always overfits and generalizes badly.&lt;&#x2F;p&gt;
&lt;p&gt;Common approaches to select the degree include:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Visual inspection&lt;&#x2F;strong&gt;: Plot the data and the fitted curve for different degrees, and choose the one that captures the trend without fitting the noise.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cross-validation&lt;&#x2F;strong&gt;: Split the data into training and validation sets, fit models of various degrees on the training set, and select the degree with the lowest prediction error on the validation set.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Information criteria&lt;&#x2F;strong&gt;: Use metrics such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) to balance goodness of fit with model complexity.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2 id=&quot;multicollinearity-considerations&quot;&gt;Multicollinearity Considerations&lt;&#x2F;h2&gt;
&lt;p&gt;One practical concern with polynomial regression is multicollinearity. The predictors $x, x^2, x^3, \ldots$ are often highly correlated, especially when $x$ values span a narrow range. High multicollinearity inflates the variance of the coefficient estimates and makes $\mathbf{X}^T\mathbf{X}$ nearly singular.&lt;&#x2F;p&gt;
&lt;p&gt;A common remedy is to center the predictor before constructing polynomial terms. Instead of using $x$, we use $x - \bar{x}$:&lt;&#x2F;p&gt;
&lt;p&gt;$$z_1 = x - \bar{x}, \quad z_2 = (x - \bar{x})^2, \quad z_3 = (x - \bar{x})^3, \quad \ldots$$&lt;&#x2F;p&gt;
&lt;p&gt;Centering reduces the correlation among the polynomial terms and improves the numerical stability of the estimation.&lt;&#x2F;p&gt;
&lt;p&gt;Another approach is to use orthogonal polynomials, which are constructed so that $\mathbf{X}^T\mathbf{X}$ is diagonal, eliminating multicollinearity entirely.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;example-quadratic-fit&quot;&gt;Example: Quadratic Fit&lt;&#x2F;h2&gt;
&lt;p&gt;Consider a quadratic model $y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \varepsilon_i$ with the design matrix:&lt;&#x2F;p&gt;
&lt;p&gt;$$\mathbf{X} = \begin{pmatrix} 1 &amp;amp; x_1 &amp;amp; x_1^2 \\ 1 &amp;amp; x_2 &amp;amp; x_2^2 \\ \vdots &amp;amp; \vdots &amp;amp; \vdots \\ 1 &amp;amp; x_n &amp;amp; x_n^2 \end{pmatrix}.$$&lt;&#x2F;p&gt;
&lt;p&gt;There are $p = 3$ parameters. The OLS solution $\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$ gives us $\hat{\beta}_0$, $\hat{\beta}_1$, and $\hat{\beta}_2$ simultaneously. The coefficient $\hat{\beta}_2$ indicates the curvature of the fitted parabola: a positive $\hat{\beta}_2$ means the curve opens upward, and a negative $\hat{\beta}_2$ means it opens downward.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;&#x2F;h2&gt;
&lt;p&gt;In this post, we explored polynomial regression:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The polynomial model $y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \cdots + \beta_d x_i^d + \varepsilon_i$ is nonlinear in $x$ but linear in the parameters.&lt;&#x2F;li&gt;
&lt;li&gt;By treating each power of $x$ as a separate predictor, polynomial regression becomes a special case of multiple linear regression.&lt;&#x2F;li&gt;
&lt;li&gt;The OLS estimator $\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$ applies directly.&lt;&#x2F;li&gt;
&lt;li&gt;Choosing the polynomial degree requires balancing fit and complexity to avoid overfitting.&lt;&#x2F;li&gt;
&lt;li&gt;Centering or using orthogonal polynomials helps address multicollinearity.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In the next post, we will examine correlation, which quantifies the strength of the linear relationship between two variables using $S_{xx}$, $S_{yy}$, and $S_{xy}$.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Multiple Linear Regression</title>
        <published>2026-01-09T00:00:00+00:00</published>
        <updated>2026-01-09T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/regression-02-multiple-linear-regression/"/>
        <id>https://jienweng.github.io/notes/regression-02-multiple-linear-regression/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/regression-02-multiple-linear-regression/">&lt;p&gt;This note extends linear regression to multiple predictors and explains how interpretation changes once features interact through a shared model. The practical challenge is understanding coefficients conditionally, not in isolation. I focus on the model form, estimation intuition, and common interpretation mistakes.&lt;&#x2F;p&gt;
&lt;p&gt;In the &lt;a href=&quot;&#x2F;posts&#x2F;regression-01-simple-linear-regression&quot;&gt;previous post&lt;&#x2F;a&gt;, we explored simple linear regression with a single predictor. In practice, the response variable $y$ often depends on more than one predictor. Multiple linear regression extends the framework to accommodate $p - 1$ independent variables.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-model&quot;&gt;The Model&lt;&#x2F;h2&gt;
&lt;p&gt;The multiple linear regression model for the $i$-th observation is:&lt;&#x2F;p&gt;
&lt;p&gt;$$y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_{p-1} x_{i,p-1} + \varepsilon_i,$$&lt;&#x2F;p&gt;
&lt;p&gt;where $x_{ij}$ is the value of the $j$-th predictor for the $i$-th observation, $\beta_j$ are the regression coefficients, and $\varepsilon_i$ is the error term. The total number of parameters is $p$ (including the intercept $\beta_0$).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;matrix-notation&quot;&gt;Matrix Notation&lt;&#x2F;h2&gt;
&lt;p&gt;Writing out the model for each observation individually becomes cumbersome as the number of predictors grows. Matrix notation provides a compact and powerful alternative.&lt;&#x2F;p&gt;
&lt;p&gt;Define the following:&lt;&#x2F;p&gt;
&lt;p&gt;$$\mathbf{Y} = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix}, \quad \mathbf{X} = \begin{pmatrix} 1 &amp;amp; x_{11} &amp;amp; x_{12} &amp;amp; \cdots &amp;amp; x_{1,p-1} \\ 1 &amp;amp; x_{21} &amp;amp; x_{22} &amp;amp; \cdots &amp;amp; x_{2,p-1} \\ \vdots &amp;amp; \vdots &amp;amp; \vdots &amp;amp; \ddots &amp;amp; \vdots \\ 1 &amp;amp; x_{n1} &amp;amp; x_{n2} &amp;amp; \cdots &amp;amp; x_{n,p-1} \end{pmatrix},$$&lt;&#x2F;p&gt;
&lt;p&gt;$$\boldsymbol{\beta} = \begin{pmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_{p-1} \end{pmatrix}, \quad \boldsymbol{\varepsilon} = \begin{pmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end{pmatrix}.$$&lt;&#x2F;p&gt;
&lt;p&gt;The model can now be written as:&lt;&#x2F;p&gt;
&lt;p&gt;$$\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}.$$&lt;&#x2F;p&gt;
&lt;p&gt;Here, $\mathbf{Y}$ is an $n \times 1$ vector of responses, $\mathbf{X}$ is an $n \times p$ design matrix (the first column of ones accounts for the intercept), $\boldsymbol{\beta}$ is a $p \times 1$ vector of coefficients, and $\boldsymbol{\varepsilon}$ is an $n \times 1$ vector of errors.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ols-in-matrix-form&quot;&gt;OLS in Matrix Form&lt;&#x2F;h2&gt;
&lt;p&gt;The sum of squared errors can be written in matrix form as:&lt;&#x2F;p&gt;
&lt;p&gt;$$SSE = (\mathbf{Y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y} - \mathbf{X}\boldsymbol{\beta}).$$&lt;&#x2F;p&gt;
&lt;p&gt;Expanding this expression:&lt;&#x2F;p&gt;
&lt;p&gt;$$SSE = \mathbf{Y}^T\mathbf{Y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}.$$&lt;&#x2F;p&gt;
&lt;p&gt;To minimize, we take the derivative with respect to $\boldsymbol{\beta}$ and set it to zero:&lt;&#x2F;p&gt;
&lt;p&gt;$$\frac{\partial SSE}{\partial \boldsymbol{\beta}} = -2\mathbf{X}^T\mathbf{Y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}.$$&lt;&#x2F;p&gt;
&lt;p&gt;This gives us the normal equations in matrix form:&lt;&#x2F;p&gt;
&lt;p&gt;$$\mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}^T\mathbf{Y}.$$&lt;&#x2F;p&gt;
&lt;p&gt;Provided that $\mathbf{X}^T\mathbf{X}$ is invertible (that is, the columns of $\mathbf{X}$ are linearly independent), we can solve for the OLS estimator:&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}.$$&lt;&#x2F;p&gt;
&lt;p&gt;This single formula generalizes the simple linear regression result. When $p = 2$ (one predictor plus the intercept), this reduces to $\hat{\beta}&lt;em&gt;1 = S&lt;&#x2F;em&gt;{xy}&#x2F;S_{xx}$ and $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$ as derived in the previous post.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;interpreting-the-coefficients&quot;&gt;Interpreting the Coefficients&lt;&#x2F;h2&gt;
&lt;p&gt;Each coefficient $\hat{\beta}_j$ (for $j = 1, 2, \ldots, p - 1$) represents the estimated change in $y$ for a one-unit increase in $x_j$, while holding all other predictors constant. This &quot;holding other variables constant&quot; interpretation is what distinguishes multiple regression from running separate simple regressions.&lt;&#x2F;p&gt;
&lt;p&gt;The intercept $\hat{\beta}_0$ represents the estimated value of $y$ when all predictors are equal to zero.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-hat-matrix&quot;&gt;The Hat Matrix&lt;&#x2F;h2&gt;
&lt;p&gt;The vector of fitted values is:&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{\mathbf{Y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y} = \mathbf{H}\mathbf{Y},$$&lt;&#x2F;p&gt;
&lt;p&gt;where $\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T$ is called the hat matrix. It &quot;puts a hat on&quot; $\mathbf{Y}$, transforming observed values into fitted values. The hat matrix is symmetric ($\mathbf{H}^T = \mathbf{H}$) and idempotent ($\mathbf{H}^2 = \mathbf{H}$).&lt;&#x2F;p&gt;
&lt;p&gt;The residual vector is:&lt;&#x2F;p&gt;
&lt;p&gt;$$\mathbf{e} = \mathbf{Y} - \hat{\mathbf{Y}} = (\mathbf{I} - \mathbf{H})\mathbf{Y}.$$&lt;&#x2F;p&gt;
&lt;h2 id=&quot;assumptions&quot;&gt;Assumptions&lt;&#x2F;h2&gt;
&lt;p&gt;The assumptions for multiple linear regression extend those of simple linear regression:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Linearity&lt;&#x2F;strong&gt;: $\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$, the model is linear in the parameters.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Full rank&lt;&#x2F;strong&gt;: The design matrix $\mathbf{X}$ has full column rank, meaning $\text{rank}(\mathbf{X}) = p$. This ensures $\mathbf{X}^T\mathbf{X}$ is invertible.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Exogeneity&lt;&#x2F;strong&gt;: $E(\boldsymbol{\varepsilon} | \mathbf{X}) = \mathbf{0}$, the errors have zero conditional mean.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Homoscedasticity&lt;&#x2F;strong&gt;: $\text{Var}(\boldsymbol{\varepsilon} | \mathbf{X}) = \sigma^2\mathbf{I}_n$, the errors have constant variance and are uncorrelated.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Normality&lt;&#x2F;strong&gt; (for inference): $\boldsymbol{\varepsilon} \sim N(\mathbf{0}, \sigma^2\mathbf{I}_n)$.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;When assumptions 1 through 4 hold, the Gauss-Markov theorem guarantees that $\hat{\boldsymbol{\beta}}$ is the Best Linear Unbiased Estimator (BLUE).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;connection-to-simple-linear-regression&quot;&gt;Connection to Simple Linear Regression&lt;&#x2F;h2&gt;
&lt;p&gt;In the special case where $p = 2$, we have a single predictor $x$ and the design matrix becomes:&lt;&#x2F;p&gt;
&lt;p&gt;$$\mathbf{X} = \begin{pmatrix} 1 &amp;amp; x_1 \\ 1 &amp;amp; x_2 \\ \vdots &amp;amp; \vdots \\ 1 &amp;amp; x_n \end{pmatrix}.$$&lt;&#x2F;p&gt;
&lt;p&gt;Computing $(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$ in this case yields the familiar results $\hat{\beta}&lt;em&gt;1 = S&lt;&#x2F;em&gt;{xy}&#x2F;S_{xx}$ and $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$, confirming that the matrix formulation is a true generalization.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;&#x2F;h2&gt;
&lt;p&gt;In this post, we extended the regression framework to handle multiple predictors:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The model $\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ uses matrix notation to express the relationship compactly.&lt;&#x2F;li&gt;
&lt;li&gt;The OLS estimator $\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$ generalizes the simple linear regression solution.&lt;&#x2F;li&gt;
&lt;li&gt;Each $\hat{\beta}_j$ measures the effect of one predictor while holding the others constant.&lt;&#x2F;li&gt;
&lt;li&gt;The hat matrix $\mathbf{H}$ projects observed values onto fitted values.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In the next post, we will explore polynomial regression, which is a special case of multiple linear regression where the predictors are powers of a single variable.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Simple Linear Regression</title>
        <published>2026-01-02T00:00:00+00:00</published>
        <updated>2026-01-02T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/regression-01-simple-linear-regression/"/>
        <id>https://jienweng.github.io/notes/regression-01-simple-linear-regression/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/regression-01-simple-linear-regression/">&lt;p&gt;This note introduces simple linear regression from first principles and focuses on how slope and intercept are estimated from data. The problem it solves is modeling a linear relationship between one predictor and one response. It is intended as the base layer for the later regression notes in this series.&lt;&#x2F;p&gt;
&lt;p&gt;In secondary school, we learn that the equation of a straight line is given by $y = mx + c$, where $m$ is the slope and $c$ is the y-intercept. In statistics and machine learning, we use a similar but more general form to model the relationship between a dependent variable $y$ and an independent variable $x$. This is known as simple linear regression.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-model&quot;&gt;The Model&lt;&#x2F;h2&gt;
&lt;p&gt;In simple linear regression, we express the relationship as:&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i,$$&lt;&#x2F;p&gt;
&lt;p&gt;where $\hat{y}_i$ is the predicted value of the dependent variable for the $i$-th observation, $\hat{\beta}_0$ is the estimated y-intercept, and $\hat{\beta}_1$ is the estimated slope coefficient. The &quot;hat&quot; notation indicates that these are estimates derived from data, not the true (unknown) population parameters $\beta_0$ and $\beta_1$.&lt;&#x2F;p&gt;
&lt;p&gt;The true model is assumed to be:&lt;&#x2F;p&gt;
&lt;p&gt;$$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,$$&lt;&#x2F;p&gt;
&lt;p&gt;where $\varepsilon_i$ represents the random error term for the $i$-th observation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ordinary-least-squares-ols&quot;&gt;Ordinary Least Squares (OLS)&lt;&#x2F;h2&gt;
&lt;p&gt;The question is: how do we find the best estimates $\hat{\beta}_0$ and $\hat{\beta}_1$? We need a systematic method that determines the line of best fit. The most common approach is Ordinary Least Squares (OLS).&lt;&#x2F;p&gt;
&lt;p&gt;OLS minimizes the sum of the squared differences between the observed values $y_i$ and the predicted values $\hat{y}_i$. These differences are called residuals, defined as $e_i = y_i - \hat{y}_i$. By squaring the residuals, we treat positive and negative errors equally and penalize larger deviations more heavily.&lt;&#x2F;p&gt;
&lt;p&gt;The objective function is the Sum of Squared Errors (SSE):&lt;&#x2F;p&gt;
&lt;p&gt;$$SSE = \sum^n_{i=1}(y_i - \hat{y}_i)^2 = \sum^n_{i=1}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2.$$&lt;&#x2F;p&gt;
&lt;h2 id=&quot;deriving-the-normal-equations&quot;&gt;Deriving the Normal Equations&lt;&#x2F;h2&gt;
&lt;p&gt;To minimize the SSE, we take partial derivatives with respect to $\hat{\beta}_0$ and $\hat{\beta}_1$ and set them equal to zero.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;partial-derivative-with-respect-to-hat-beta-0&quot;&gt;Partial derivative with respect to $\hat{\beta}_0$&lt;&#x2F;h3&gt;
&lt;p&gt;$$\frac{\partial SSE}{\partial \hat{\beta}_0} = -2\sum^n_{i=1}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0.$$&lt;&#x2F;p&gt;
&lt;p&gt;Dividing both sides by $-2$ and expanding the sum:&lt;&#x2F;p&gt;
&lt;p&gt;$$\sum^n_{i=1} y_i - n\hat{\beta}_0 - \hat{\beta}_1 \sum^n_{i=1} x_i = 0.$$&lt;&#x2F;p&gt;
&lt;p&gt;Solving for $\hat{\beta}_0$:&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x},$$&lt;&#x2F;p&gt;
&lt;p&gt;where $\bar{x} = \frac{1}{n}\sum^n_{i=1}x_i$ and $\bar{y} = \frac{1}{n}\sum^n_{i=1}y_i$ are the sample means.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;partial-derivative-with-respect-to-hat-beta-1&quot;&gt;Partial derivative with respect to $\hat{\beta}_1$&lt;&#x2F;h3&gt;
&lt;p&gt;$$\frac{\partial SSE}{\partial \hat{\beta}_1} = -2\sum^n_{i=1}x_i(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0.$$&lt;&#x2F;p&gt;
&lt;p&gt;Dividing by $-2$ and expanding:&lt;&#x2F;p&gt;
&lt;p&gt;$$\sum^n_{i=1} x_i y_i - \hat{\beta}_0 \sum^n_{i=1} x_i - \hat{\beta}_1 \sum^n_{i=1} x_i^2 = 0.$$&lt;&#x2F;p&gt;
&lt;p&gt;Substituting $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$:&lt;&#x2F;p&gt;
&lt;p&gt;$$\sum^n_{i=1} x_i y_i - (\bar{y} - \hat{\beta}_1 \bar{x})\sum^n_{i=1} x_i - \hat{\beta}_1 \sum^n_{i=1} x_i^2 = 0.$$&lt;&#x2F;p&gt;
&lt;p&gt;After simplification, we obtain:&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{\beta}_1 = \frac{\sum^n_{i=1}(x_i - \bar{x})(y_i - \bar{y})}{\sum^n_{i=1}(x_i - \bar{x})^2}.$$&lt;&#x2F;p&gt;
&lt;h2 id=&quot;introducing-s-xx-and-s-xy-notation&quot;&gt;Introducing $S_{xx}$ and $S_{xy}$ Notation&lt;&#x2F;h2&gt;
&lt;p&gt;To write the estimators more concisely, we define the following summary statistics:&lt;&#x2F;p&gt;
&lt;p&gt;$$S_{xx} = \sum^n_{i=1}(x_i - \bar{x})^2 = \sum^n_{i=1}x_i^2 - n\bar{x}^2,$$&lt;&#x2F;p&gt;
&lt;p&gt;$$S_{xy} = \sum^n_{i=1}(x_i - \bar{x})(y_i - \bar{y}) = \sum^n_{i=1}x_i y_i - n\bar{x}\bar{y}.$$&lt;&#x2F;p&gt;
&lt;p&gt;Using this notation, the OLS estimators become:&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{\beta}_1 = \frac{S_{xy}}{S_{xx}},$$&lt;&#x2F;p&gt;
&lt;p&gt;$$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}.$$&lt;&#x2F;p&gt;
&lt;p&gt;These are elegant expressions that reveal the structure of the estimates. The slope $\hat{\beta}_1$ is the ratio of the joint variability of $x$ and $y$ (captured by $S_{xy}$) to the variability of $x$ alone (captured by $S_{xx}$). The intercept $\hat{\beta}_0$ ensures the regression line passes through the point $(\bar{x}, \bar{y})$.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;residuals-and-fitted-values&quot;&gt;Residuals and Fitted Values&lt;&#x2F;h2&gt;
&lt;p&gt;Once we have $\hat{\beta}_0$ and $\hat{\beta}_1$, we can compute:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fitted values&lt;&#x2F;strong&gt;: $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$ for each observation.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Residuals&lt;&#x2F;strong&gt;: $e_i = y_i - \hat{y}_i$ for each observation.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Two important properties of OLS residuals are worth noting:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;The residuals sum to zero: $\sum^n_{i=1} e_i = 0$.&lt;&#x2F;li&gt;
&lt;li&gt;The residuals are uncorrelated with the fitted values: $\sum^n_{i=1} e_i \hat{y}_i = 0$.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;These properties follow directly from the normal equations.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-assumptions&quot;&gt;Key Assumptions&lt;&#x2F;h2&gt;
&lt;p&gt;For OLS to produce reliable estimates, the following assumptions are typically required:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Linearity&lt;&#x2F;strong&gt;: The relationship between $x$ and $y$ is linear in the parameters.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Independence&lt;&#x2F;strong&gt;: The observations are independent of one another.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Homoscedasticity&lt;&#x2F;strong&gt;: The variance of the error terms $\varepsilon_i$ is constant across all values of $x$.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Normality&lt;&#x2F;strong&gt;: The error terms are normally distributed, that is, $\varepsilon_i \sim N(0, \sigma^2)$.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;When these assumptions hold, OLS produces the Best Linear Unbiased Estimators (BLUE) according to the Gauss-Markov theorem.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;&#x2F;h2&gt;
&lt;p&gt;In this post, we covered the fundamentals of simple linear regression:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The model $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$ describes the estimated linear relationship between $x$ and $y$.&lt;&#x2F;li&gt;
&lt;li&gt;OLS minimizes the sum of squared errors to find the best-fitting line.&lt;&#x2F;li&gt;
&lt;li&gt;The estimators $\hat{\beta}&lt;em&gt;1 = S&lt;&#x2F;em&gt;{xy}&#x2F;S_{xx}$ and $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$ are derived from the normal equations.&lt;&#x2F;li&gt;
&lt;li&gt;The $S_{xx}$ and $S_{xy}$ notation provides a compact way to express these results.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In the next post, we will extend this framework to handle multiple independent variables through multiple linear regression.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Let&#x27;s talk about Solar Photovoltaic Systems in WWTPs Malaysia</title>
        <published>2025-12-01T00:00:00+00:00</published>
        <updated>2025-12-02T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/photovoltaic-systems-in-wwtps-malaysia/"/>
        <id>https://jienweng.github.io/notes/photovoltaic-systems-in-wwtps-malaysia/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/photovoltaic-systems-in-wwtps-malaysia/">&lt;p&gt;This note reviews how solar photovoltaic systems can reduce energy cost and emissions in wastewater treatment plants (WWTPs) in Malaysia. The core problem is that WWTP operations are electricity-intensive and exposed to tariff volatility. I summarize constraints, control architecture, and where ML-based forecasting can improve deployment decisions.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;why-solar-pv-in-wwtps&quot;&gt;Why Solar PV in WWTPs?&lt;&#x2F;h1&gt;
&lt;p&gt;Wastewater treatment facilities are globally recognized as energy-intensive operations, often relying on electrical energy accounting for a significant portion of their operational expenditure (OPEX). For conventional WWTPs, energy use contributes between 25%-60% of the total OPEX &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Muzaffar2022&lt;&#x2F;sup&gt;
. This high dependency on the central electricity grid exposes these facilities to electricity tariffs volatility and supply uncertainties. The national sewerage company, Indah Water Konsortium (IWK) Sdn Bhd, has experienced a substantial increase with electricity costs ballooned from RM22.53 million in 2000 to RM256.30 million in 2020 &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;IWK2021&lt;&#x2F;sup&gt;
. This became the primary motivation for IWK to explore renewable energy sources, particularly solar PV systems, to mitigate energy costs and enhance sustainability.&lt;&#x2F;p&gt;
&lt;p&gt;IWK operates as Malaysia&#x27;s national sewerage company and manages a vast network of public treatment plants. As of December 2021, IWK operated and maintained 7,272 public Sewerage Treatment Plants (STPs) and 1,375 network pump stations across the country &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;IWK2021_report&lt;&#x2F;sup&gt;
. The regulatory oversight is provided by the Suruhanjaya Perkhidmatan Air Negara (SPAN), which ensures compliance with national water quality, ensuring that operators comply with stipulated standards and contractual obligations.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;energy-demand-characteristics-of-wwtps&quot;&gt;Energy Demand Characteristics of WWTPs&lt;&#x2F;h1&gt;
&lt;p&gt;Before implementing solar PV systems, it is crucial to understand the energy demand characteristics of WWTPs to correctly sizing and optimizing a solar PV system. The energy consumption profile of WWTP is highly influence by the pkant scale and the dominant treatment technology used.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;specific-energy-consumption-sec-and-categorization&quot;&gt;Specific Energy Consumption (SEC) and Categorization&lt;&#x2F;h2&gt;
&lt;p&gt;Energy intensity, generally measured as Specific Energy Consumption (SEC) in kilowatt-hour per cubic meter (kWh&#x2F;m³) of treated wastewater, is a key metric for evaluating the energy efficiency of WWTPs. The SEC values can vary significantly based on the treatment processes employed and the plant&#x27;s capacity. Data suggests that smaller WWTPs, particular those below $10,000 m^3&#x2F;month$ capacity, tends to bear a disproportionately higher electricity cost, accounting for 30%-40% of their total running costs &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Muzaffar2022&lt;&#x2F;sup&gt;
. In contrast, larger WWTPs with capacities exceeding typically exhibit lower SEC values, accounting for 15%-30% of the total running costs.&lt;&#x2F;p&gt;
&lt;p&gt;Specific data collected from village WWTPs in Romania indicated high annual average SEC values, ranging between $1.786 kWh&#x2F;m^3$ to $2.334 kWh&#x2F;m^3$ of treated wastewater &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Tokos2021&lt;&#x2F;sup&gt;
. Furthermore, the complexity of sewage sludge disposal capacity has increased rapidly alongside population growthm reaching 7 millions $m^3$ annually. Managing this sludge requires significant energy input, contributing to the sector&#x27;s overall energy consumption, with sewage sludge treatment alone consuming an estimated $544,900 GWh$ across IWK operations between 2016 and 2019 &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Quan2022&lt;&#x2F;sup&gt;
.&lt;&#x2F;p&gt;
&lt;p&gt;This observed variability indicates that the primary candidates for immediate solar PV intergartion are the small to medium sized conventional STPs. These plants have higher SEC values and higher dependence on expensive grid power, meaninf the marginal return from solar PV energy offset is maximized in this segment.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;identification-of-energy-intensive-processes&quot;&gt;Identification of Energy-Intensive Processes&lt;&#x2F;h2&gt;
&lt;p&gt;The greatest energy concentration of electrical energy demand within a conventional WWTP is typically found in the biological treatment line (BgT), primarily driven by the aeration systems. This process is necessary for activated sludge processes to maintain adequate dissolved oxygen levels for microbial activity. Data confirms that BgT accounts for the majority of total energy consumption, ranging from 63.2% to 72.9% in surveyed WWTPs &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Tokos2021&lt;&#x2F;sup&gt;
.&lt;&#x2F;p&gt;
&lt;p&gt;Whether thorugh diffused air of mechanical aerators (such as those used in &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.epa.gov&#x2F;system&#x2F;files&#x2F;documents&#x2F;2022-10&#x2F;oxidation-ditch-factsheet.pdf&quot;&gt;oxidation ditches&lt;&#x2F;a&gt;), aeration systems requires continuous operation to provide oxygen transfer, circulation, and mixing. This need for continuous, high-power during daytime operational hours, and we realized this characteristic is highly advantageous for solar PV integration. The steady daytime demand for blowers and aerators means that the generated solar power can be immediately and entirely consumed, leadning to high utilization rates, and diminish the need for Battery Energy Storage Systems (BESS), which is often a significant cost driver in renewable energy (RE) projects.&lt;&#x2F;p&gt;
&lt;p&gt;The high concentration of energy use in aeration suggests that before implementing solar PV systems, WWTP operators should measures energy efficiency in the aeration systems. Simply installing a large PV array without optimzing the blowers and biological processes may lead  to a larger and more costly system. Streamlining the anaerobic biological treatments is possible and should be prioritized to reduce the overall energy demand before sizing the solar PV system.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;advanced-pv-power-forecasting-and-control-systems&quot;&gt;Advanced PV Power Forecasting and Control Systems&lt;&#x2F;h1&gt;
&lt;p&gt;The modern integration of photovoltaic systems into WWTPs requires sophisticated forecasting and control architectures. Recent research demonstrates a clear evolution toward hybrid AI-based forecasting models that combine deep learning with optimization algorithms &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;IturraldeCarrera2025&lt;&#x2F;sup&gt;
. These advanced models significantly outperform classical deterministic methods in handling the highly dynamic and non-linear conditions of real-world PV generation.&lt;&#x2F;p&gt;
&lt;p&gt;Machine learning ensemble algorithms such as XGBoost, LightGBM, and CatBoost have emerged as foundational tools for high-accuracy solar power prediction &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Nguyen2025&lt;&#x2F;sup&gt;
. A critical finding is that humidity and ambient temperature emerge as the most influential factors affecting PV module efficiency, particularly in tropical and humid climates.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;integrated-energy-management&quot;&gt;Integrated Energy Management&lt;&#x2F;h2&gt;
&lt;p&gt;Effective PV system integration requires a multi-layered control hierarchy. The foundational layer is Maximum Power Point Tracking (MPPT), which continuously adjusts the DC-DC converter to extract maximum available power from the solar array. A critical technical constraint in many jurisdictions is the Zero Export requirement, which prohibits injecting electrical power back into the grid. To comply, Zero Export Controllers (ZEC) operate on near-instantaneous feedback loops &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Alnawafah2025&lt;&#x2F;sup&gt;
.&lt;&#x2F;p&gt;
&lt;p&gt;Battery Energy Storage Systems provide flexibility to address both PV intermittency and zero-export constraints. Advanced control algorithms optimize charging during low-cost periods and discharging during peak tariff hours &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Hvala2025&lt;&#x2F;sup&gt;
.&lt;&#x2F;p&gt;
&lt;!-- # Demand-Side Optimization

Maximizing PV self-consumption requires accurate prediction of not only generation but also internal treatment demand. Dynamic ensemble models utilizing machine learning have been successfully applied to predict water quality characteristics such as COD and TN, reducing prediction errors to 9.5%-15.2% MAPE &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Yu2025&lt;&#x2F;sup&gt;
. This intelligent demand-side management enables facilities to achieve substantial energy consumption reductions while modulating energy draw in response to predicted PV output. --&gt;
&lt;!-- # Floating Photovoltaic Systems

Floating Photovoltaic (FPV) installations over basins represent an emerging solution to land-use constraints. FPV systems provide inherent performance advantages through water cooling effects that counteract thermal degradation. By actively mitigating thermal stress, FPV improves both instantaneous yield and long-term reliability &lt;sup class=&quot;cite-ref&quot; title=&quot;bibliography&amp;#x2F;renewable_energy.bib&quot;&gt;Selj2025&lt;&#x2F;sup&gt;
.

The aquatic deployment environment introduces unique technical challenges requiring specialized components such as double-glass laminated modules and IP68-rated electrical components to ensure durability in humid and corrosive conditions. --&gt;
&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h1&gt;
&lt;p&gt;The successful integration of solar PV systems into Malaysian WWTPs needs a strategic of advanced forecasting. By combining accurate generation forecasting with sophisticated demand-side prediction, facilities can achieve near energy-autonomous operation while positioning themselves as flexible power assets responding to grid demands.&lt;&#x2F;p&gt;
&lt;p&gt;In the future research we need to explore the details implementation of predictive modelling and control algorithms in WWTPs, as well as conducting in-depth literature reviews on case studies of existing energy predictive systems in WWTP.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Gradient Descent Algorithm Explained</title>
        <published>2025-11-30T00:00:00+00:00</published>
        <updated>2025-11-30T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/gradient-descent/"/>
        <id>https://jienweng.github.io/notes/gradient-descent/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/gradient-descent/">&lt;p&gt;This note explains gradient descent as an optimization procedure for minimizing differentiable objectives and clarifies the role of the learning rate in convergence behavior. The practical question is not only how the rule works, but when it becomes unstable. The walkthrough focuses on the core math and interpretable examples.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;mathematics-technical-parts&quot;&gt;Mathematics Technical Parts&lt;&#x2F;h2&gt;
&lt;p&gt;Consider a differentiable function $f: \mathbb{R}^n \to \mathbb{R}$. The gradient of $f$ at a point $x \in \mathbb{R}^n$ is denoted as $\nabla f(x)$, which is a vector of partial derivatives. The gradient points in the direction of the steepest ascent of the function. To find a local minimum, we need to move in the opposite direction of the gradient. Hence, the exploits this by moving in the opposite direction:&lt;&#x2F;p&gt;
&lt;p&gt;$$x_{k+1} = x_k - \alpha \nabla f(x_k),$$&lt;&#x2F;p&gt;
&lt;p&gt;where:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;$x_k$ is the current point,&lt;&#x2F;li&gt;
&lt;li&gt;$\alpha$ is the learning rate, where $\alpha &amp;gt; 0$,&lt;&#x2F;li&gt;
&lt;li&gt;$\nabla f(x_k)$ is the gradient of $f$ at point $x_k$.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;We simulate two different scenarios, one with postive gradient in loss function, and another with negative gradient in loss function.&lt;&#x2F;p&gt;
&lt;hr&gt;
&lt;h3 id=&quot;scenario-1-positive-gradient&quot;&gt;Scenario 1: Positive Gradient&lt;&#x2F;h3&gt;
&lt;p&gt;Let&#x27;s consider a simple quadratic function:&lt;&#x2F;p&gt;
&lt;p&gt;$$f(x) = x^2 + 4x + 4.$$&lt;&#x2F;p&gt;
&lt;p&gt;The gradient of this function is:&lt;&#x2F;p&gt;
&lt;p&gt;$$\nabla f(x) = 2x + 4.$$&lt;&#x2F;p&gt;
&lt;p&gt;Starting from an initial point, say $x_0 = 0$, and choosing a learning rate $\alpha = 0.1$, we can apply the gradient descent update rule iteratively:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Compute the gradient at the current point: $\nabla f(x_0) = 2(0) + 4 = 4$.&lt;&#x2F;li&gt;
&lt;li&gt;Update the point: $x_1 = x_0 - 0.1 \cdot 4 = 0 - 0.4 = -0.4$.&lt;&#x2F;li&gt;
&lt;li&gt;Repeat the process for a number of iterations.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;For the first 5 iteration, we can tabulate the results as follows:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Iteration ($k$)&lt;&#x2F;th&gt;&lt;th&gt;Current Point ($x_k$)&lt;&#x2F;th&gt;&lt;th&gt;Gradient ($\nabla f(x_k)$)&lt;&#x2F;th&gt;&lt;th&gt;Updated Point ($x_{k+1}$)&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;0&lt;&#x2F;td&gt;&lt;td&gt;0.0&lt;&#x2F;td&gt;&lt;td&gt;4.0&lt;&#x2F;td&gt;&lt;td&gt;-0.4&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;1&lt;&#x2F;td&gt;&lt;td&gt;-0.4&lt;&#x2F;td&gt;&lt;td&gt;3.2&lt;&#x2F;td&gt;&lt;td&gt;-0.72&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;2&lt;&#x2F;td&gt;&lt;td&gt;-0.72&lt;&#x2F;td&gt;&lt;td&gt;2.56&lt;&#x2F;td&gt;&lt;td&gt;-0.976&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;3&lt;&#x2F;td&gt;&lt;td&gt;-0.976&lt;&#x2F;td&gt;&lt;td&gt;2.048&lt;&#x2F;td&gt;&lt;td&gt;-1.1808&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;4&lt;&#x2F;td&gt;&lt;td&gt;-1.1808&lt;&#x2F;td&gt;&lt;td&gt;1.6384&lt;&#x2F;td&gt;&lt;td&gt;-1.34464&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;We can observe that the gradient os loss function from positive value is moving backward each time step, and the first step size is larger but it gradually decreases as we approach the minimum point. This is the brilliant part of gradient descent, as it automatically take larger steps when we are far from the minimum and smaller steps as we get closer to the minimum.&lt;&#x2F;p&gt;
&lt;p&gt;We can visualize the process using a simple plot:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo z-code&quot;&gt;&lt;code data-lang=&quot;python&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; numpy&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; as&lt;&#x2F;span&gt;&lt;span&gt; np&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; matplotlib&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;pyplot&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; as&lt;&#x2F;span&gt;&lt;span&gt; plt&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-punctuation z-definition z-comment&quot;&gt;#&lt;&#x2F;span&gt;&lt;span class=&quot;z-comment&quot;&gt; Define the function and its gradient&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-storage z-type&quot;&gt;def&lt;&#x2F;span&gt;&lt;span class=&quot;z-entity z-name&quot;&gt; f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable z-parameter z-function&quot;&gt;x&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;    return&lt;&#x2F;span&gt;&lt;span&gt; x&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;**&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;2&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; +&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 4&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;*&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; +&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 4&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-storage z-type&quot;&gt;def&lt;&#x2F;span&gt;&lt;span class=&quot;z-entity z-name&quot;&gt; grad_f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable z-parameter z-function&quot;&gt;x&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;    return&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 2&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;*&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; +&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 4&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-punctuation z-definition z-comment&quot;&gt;#&lt;&#x2F;span&gt;&lt;span class=&quot;z-comment&quot;&gt; Gradient Descent parameters&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;alpha&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 0.1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;x0&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;iterations&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 20&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-punctuation z-definition z-comment&quot;&gt;#&lt;&#x2F;span&gt;&lt;span class=&quot;z-comment&quot;&gt; Store the points&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; [&lt;&#x2F;span&gt;&lt;span&gt;x0&lt;&#x2F;span&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;for&lt;&#x2F;span&gt;&lt;span&gt; _&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; in&lt;&#x2F;span&gt;&lt;span class=&quot;z-support&quot;&gt; range&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;iterations&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    grad&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; grad_f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;-&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    x_new&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; x_points&lt;&#x2F;span&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;-&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; -&lt;&#x2F;span&gt;&lt;span&gt; alpha&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; *&lt;&#x2F;span&gt;&lt;span&gt; grad&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    x_points&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;append&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_new&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-punctuation z-definition z-comment&quot;&gt;#&lt;&#x2F;span&gt;&lt;span class=&quot;z-comment&quot;&gt; Plotting&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; np&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;linspace&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;-&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;5&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 1&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 100&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;y&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;plot&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span&gt; y&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; label&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;f(x) = x^2 + 4x + 4&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;scatter&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span&gt; f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;np&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;array&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; color&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;red&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;plot&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span&gt; f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;np&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;array&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; color&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;red&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; linestyle&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;--&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; label&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;Gradient Descent Path&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;title&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;Gradient Descent on f(x)&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;xlabel&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;ylabel&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;f(x)&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;legend&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;grid&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;show&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;After 20 iterations, the points converge towards the minimum point at $x = -2$. We can see how the points move along the curve of the function, gradually approaching the minimum.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;316430f9-5f75-41db-b0ba-4553e3763a94?format=jpeg&quot; alt=&quot;Gradient Descent Visualization&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;hr&gt;
&lt;h3 id=&quot;scenario-2-negative-gradient&quot;&gt;Scenario 2: Negative Gradient&lt;&#x2F;h3&gt;
&lt;p&gt;Now, let&#x27;s consider a function with a negative gradient:
$$f(x) = -x^3 + 4x^2 - 4.$$&lt;&#x2F;p&gt;
&lt;p&gt;The gradient of this function is:
$$\nabla f(x) = -3x^2 + 8x.$$&lt;&#x2F;p&gt;
&lt;p&gt;Similar to what we have done on the previous example, we start from an initial point, say $x_0 = 0$, and choosing a learning rate $\alpha = 0.01$, we can apply the gradient descent update rule iteratively:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Compute the gradient at the current point: $\nabla f(x_0) = -3(0)^2 + 8(0) = 0$.&lt;&#x2F;li&gt;
&lt;li&gt;Update the point: $x_1 = x_0 - 0.01 \cdot 0 = 0$.&lt;&#x2F;li&gt;
&lt;li&gt;Repeat the process for a number of iterations.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;For the first 5 iterations, we can tabulate the results as follows:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Iteration ($k$)&lt;&#x2F;th&gt;&lt;th&gt;Current Point ($x_k$)&lt;&#x2F;th&gt;&lt;th&gt;Gradient ($\nabla f(x_k)$)&lt;&#x2F;th&gt;&lt;th&gt;Updated Point ($x_{k+1}$)&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;0&lt;&#x2F;td&gt;&lt;td&gt;1.0000&lt;&#x2F;td&gt;&lt;td&gt;5.0000&lt;&#x2F;td&gt;&lt;td&gt;0.9500&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;1&lt;&#x2F;td&gt;&lt;td&gt;0.9500&lt;&#x2F;td&gt;&lt;td&gt;4.7175&lt;&#x2F;td&gt;&lt;td&gt;0.9028&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;2&lt;&#x2F;td&gt;&lt;td&gt;0.9028&lt;&#x2F;td&gt;&lt;td&gt;4.4533&lt;&#x2F;td&gt;&lt;td&gt;0.8583&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;3&lt;&#x2F;td&gt;&lt;td&gt;0.8583&lt;&#x2F;td&gt;&lt;td&gt;4.2057&lt;&#x2F;td&gt;&lt;td&gt;0.8162&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;4&lt;&#x2F;td&gt;&lt;td&gt;0.8162&lt;&#x2F;td&gt;&lt;td&gt;3.9734&lt;&#x2F;td&gt;&lt;td&gt;0.7765&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;In this scenario, we can observe that the gradient of loss function from negative value is moving forward each time step, and the first step size is larger but it gradually decreases as we approach the minimum point. Similar to the previous scenario, gradient descent automatically adjusts the step size based on the distance from the minimum. Similarly, we can visualize the process using a simple plot:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo z-code&quot;&gt;&lt;code data-lang=&quot;python&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; numpy&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; as&lt;&#x2F;span&gt;&lt;span&gt; np&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; matplotlib&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;pyplot&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; as&lt;&#x2F;span&gt;&lt;span&gt; plt&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-punctuation z-definition z-comment&quot;&gt;#&lt;&#x2F;span&gt;&lt;span class=&quot;z-comment&quot;&gt; Define the function and its gradient&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-storage z-type&quot;&gt;def&lt;&#x2F;span&gt;&lt;span class=&quot;z-entity z-name&quot;&gt; f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable z-parameter z-function&quot;&gt;x&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;    return&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; -&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;**&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;3&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; +&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 4&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;*&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;**&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;2&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; -&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 4&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-storage z-type&quot;&gt;def&lt;&#x2F;span&gt;&lt;span class=&quot;z-entity z-name&quot;&gt; grad_f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable z-parameter z-function&quot;&gt;x&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;    return&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; -&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;3&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;*&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;**&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;2&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; +&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 8&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;*&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-punctuation z-definition z-comment&quot;&gt;#&lt;&#x2F;span&gt;&lt;span class=&quot;z-comment&quot;&gt; Gradient Descent parameters&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;alpha&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 0.01&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;x0&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;iterations&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 40&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-punctuation z-definition z-comment&quot;&gt;#&lt;&#x2F;span&gt;&lt;span class=&quot;z-comment&quot;&gt; Store the points&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; [&lt;&#x2F;span&gt;&lt;span&gt;x0&lt;&#x2F;span&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-keyword&quot;&gt;for&lt;&#x2F;span&gt;&lt;span&gt; _&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; in&lt;&#x2F;span&gt;&lt;span class=&quot;z-support&quot;&gt; range&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;iterations&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    grad&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; grad_f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;-&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    x_new&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; x_points&lt;&#x2F;span&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;-&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; -&lt;&#x2F;span&gt;&lt;span&gt; alpha&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; *&lt;&#x2F;span&gt;&lt;span&gt; grad&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    x_points&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;append&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_new&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span class=&quot;z-punctuation z-definition z-comment&quot;&gt;#&lt;&#x2F;span&gt;&lt;span class=&quot;z-comment&quot;&gt; Plotting&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; np&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;linspace&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;-&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 3&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 100&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;y&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;plot&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span&gt; y&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; label&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;f(x) = -x^3 + 4x^2 - 4&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;scatter&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span&gt; f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;np&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;array&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; color&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;red&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;plot&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span&gt; f&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;np&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;array&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x_points&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; color&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;red&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; linestyle&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;--&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-variable&quot;&gt; label&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;=&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;Gradient Descent Path&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;title&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;Gradient Descent on f(x)&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;xlabel&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;x&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;ylabel&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span class=&quot;z-string&quot;&gt;f(x)&lt;&#x2F;span&gt;&lt;span class=&quot;z-punctuation z-definition z-string&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;legend&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;grid&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;plt&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;show&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;After 20 iterations, the points converge towards the minimum point at approximately $x = 0$. We can see how the points move along the curve of the function, gradually approaching the minimum. We plotted the gradient descent path on the function curve:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;081bc8a3-abb0-47db-8da6-97c1a7a4ba1b?format=jpeg&quot; alt=&quot;Gradient Descent Visualization&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;proof-of-convergence&quot;&gt;Proof of Convergence&lt;&#x2F;h2&gt;
&lt;p&gt;To prove the convergence of the gradient descent algorithm, we need to show that the sequence of points generated by the algorithm converges to a local minimum of the function $f(x)$. We assume that $f$ is a convex function with Lipschitz continuous gradients, meaning there exists a constant $L &amp;gt; 0$ such that for all $x, y \in \mathbb{R}^n$,
$$|\nabla f(x) - \nabla f(y)| \leq L |x - y|.$$&lt;&#x2F;p&gt;
&lt;p&gt;Under convexity alone, we show that the rate of convergence is sublinear. Specifically, we can show that after $k$ iterations, the function value satisfies:&lt;&#x2F;p&gt;
&lt;p&gt;$$
f(x_k) - f(x^*) \leq \frac{L |x_0-x^*|^2 }{2k},
$$&lt;&#x2F;p&gt;
&lt;p&gt;where $x^*$ is the global minimum point of $f$. This indicates that as the number of iterations $k$ increases, the function value approaches the minimum value at a rate inversely proportional to $k$.&lt;&#x2F;p&gt;
&lt;p&gt;This completes the proof of convergence for the gradient descent algorithm under the assumptions of convexity and Lipschitz continuous gradients. The algorithm effectively finds a local minimum of the function $f(x)$ by iteratively updating the points in the direction of the steepest descent.&lt;&#x2F;p&gt;
&lt;p&gt;But this is completely tied to the choice of learning rate $\alpha$. If $\alpha$ is too large, the algorithm may overshoot the minimum and diverge. If $\alpha$ is too small, the convergence will be very slow. Therefore, choosing an appropriate learning rate is crucial for the success of the gradient descent algorithm. They are various techniques to adaptively adjust the learning rate during the optimization process, such as learning rate schedules and adaptive optimization algorithms like Adam and RMSprop, which we can explore in future posts.&lt;&#x2F;p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;known-issues-of-gradient-descent&quot;&gt;Known Issues of Gradient Descent&lt;&#x2F;h2&gt;
&lt;p&gt;While gradient descent is a powerful optimization algorithm, it does have some known issues. In multivariate functions, the presence of saddle points can affect the convergence. Saddle points are points where the gradient is zero, but they are neither local minima nor local maxima. In high-dimensional spaces, saddle points are more prevalent than local minima, and gradient descent can get stuck at these points, leading to slow convergence or failure to find the global minimum. A popular example is the function $f(x, y) = x^2 - y^2$, which has a saddle point at $(0, 0)$. The direction vector at this point is zero, and gradient descent may struggle to escape this point.&lt;&#x2F;p&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;https:&amp;#x2F;&amp;#x2F;cdn.cosmos.so&amp;#x2F;41b26430-2fde-4cfb-8d2b-a9debca6d4ee?format=jpeg&quot; alt=&quot;Gradient Descent Saddle Point&quot;&gt;
  
  &lt;figcaption&gt;Gradient Descent Saddle Point&lt;&#x2F;figcaption&gt;
  
&lt;&#x2F;figure&gt;
&lt;p&gt;To mitigate the issues with saddle points, various techniques can be employed, such as adding noise to the gradients, using momentum-based methods, or employing second-order optimization methods that consider the curvature of the function, which we can explore in future discussions. But overall, gradient descent remains a fundamental and widely used optimization algorithm in machine learning and various other fields.&lt;&#x2F;p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;Despite its simplicity, gradient descent is a powerful optimization algorithm that forms the backbone of many machine learning algorithms. By iteratively updating the parameters in the direction of the steepest descent, gradient descent effectively finds local minima of differentiable functions. Understanding its mathematical foundations and practical implementations is crucial for anyone working in the field of machine learning and optimization.&lt;&#x2F;p&gt;
&lt;p&gt;It&#x27;s remarkable how such a simple iterative process can lead optimize almost any complex function in real life applications. I truly cannot appreciate enough the beauty of this elegant mathematical concept.&lt;&#x2F;p&gt;
&lt;p&gt;For those who are interested to learn more about gradient descent, I highly recommend watching the following video by StatQuest, which provides an excellent visual explanation of the algorithm:&lt;&#x2F;p&gt;
&lt;figure class=&quot;wide video-embed&quot;&gt;
  &lt;div class=&quot;video-shell&quot;&gt;
    &lt;iframe
      src=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;embed&#x2F;sDv4f4s2SB8&quot;
      title=&quot;Gradient Descent, Step-by-Step | StatQuest&quot;
      loading=&quot;lazy&quot;
      allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&quot;
      referrerpolicy=&quot;strict-origin-when-cross-origin&quot;
      allowfullscreen
    &gt;&lt;&#x2F;iframe&gt;
  &lt;&#x2F;div&gt;
  
  &lt;figcaption&gt;Gradient Descent, Step-by-Step | StatQuest&lt;&#x2F;figcaption&gt;
  
&lt;&#x2F;figure&gt;
&lt;p&gt;I hope this post has provided a clear understanding of the gradient descent.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>My Gallery of Talentbank Boardroom Challenge 2025</title>
        <published>2025-11-14T00:00:00+00:00</published>
        <updated>2025-11-14T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/blog/talentbank-boardroom-challenge/"/>
        <id>https://jienweng.github.io/blog/talentbank-boardroom-challenge/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/blog/talentbank-boardroom-challenge/">&lt;p&gt;This post is a compact gallery plus debrief from the Talentbank Boardroom Challenge 2025. I keep the narrative focused on preparation, presentation decisions, and the specific lessons carried into later projects.&lt;&#x2F;p&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;https:&amp;#x2F;&amp;#x2F;cdn.cosmos.so&amp;#x2F;757eebfe-bcda-458c-a666-bb88ff978ed2?format=jpeg&quot; alt=&quot;Talentbank Boardroom Challenge 2025&quot;&gt;
  
&lt;&#x2F;figure&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;https:&amp;#x2F;&amp;#x2F;cdn.cosmos.so&amp;#x2F;e9ab40f7-f55a-40b0-90f9-43afddce3592?format=jpeg&quot; alt=&quot;Talentbank Boardroom Challenge 2025&quot;&gt;
  
&lt;&#x2F;figure&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;https:&amp;#x2F;&amp;#x2F;cdn.cosmos.so&amp;#x2F;22cf9c61-bcc4-4cb3-a5d3-901668997566?format=jpeg&quot; alt=&quot;Talentbank Boardroom Challenge 2025&quot;&gt;
  
&lt;&#x2F;figure&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;https:&amp;#x2F;&amp;#x2F;cdn.cosmos.so&amp;#x2F;e06b9475-0616-4202-b7cb-7644eefba819?format=jpeg&quot; alt=&quot;Talentbank Boardroom Challenge 2025&quot;&gt;
  
&lt;&#x2F;figure&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;https:&amp;#x2F;&amp;#x2F;cdn.cosmos.so&amp;#x2F;dc8979eb-c7e4-4c5a-9530-921feb0b0ae4?format=jpeg&quot; alt=&quot;Talentbank Boardroom Challenge 2025&quot;&gt;
  
&lt;&#x2F;figure&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;https:&amp;#x2F;&amp;#x2F;cdn.cosmos.so&amp;#x2F;4a589e9c-6ab9-458d-a659-7c47b8f8b583?format=jpeg&quot; alt=&quot;Talentbank Boardroom Challenge 2025&quot;&gt;
  
&lt;&#x2F;figure&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Reinforcement learning practices in healtcare applications</title>
        <published>2025-10-21T00:00:00+00:00</published>
        <updated>2025-10-21T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/reinforcement-learning-practices-in-healthcare/"/>
        <id>https://jienweng.github.io/notes/reinforcement-learning-practices-in-healthcare/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/reinforcement-learning-practices-in-healthcare/">&lt;p&gt;This note reviews practical reinforcement learning use cases in healthcare and the constraints that matter in deployment. The challenge is that policy learning in clinical settings is high-stakes, partially observed, and often offline. I summarize where RL is promising and where reliability and safety dominate design choices.&lt;&#x2F;p&gt;
&lt;p&gt;In healthcare applications, artificial intelligence (AI) plays a crucial role in transforming patient care, diagnostics, and treatment planning to make healthcare more efficient and effective. However, if AI is used improperly, it may leads to worse outcomes rather than improved ones.&lt;&#x2F;p&gt;
&lt;p&gt;In the subset of AI, reinforcement learning (RL) has shown great promise in optimising sequential decision-making processes, which are common settings in healtcare industry. However, applying RL in healthcare settings requires careful consideration of several important practices to ensure a safety outcome. To illustrate the pitfalls of reinforcement learning, we consider the sepsis management, which remains wide uncertainty in the way clinicians make decisions.&lt;&#x2F;p&gt;
&lt;p&gt;In the context of sepsis, a history may include a patient&#x27;s vital signs, laboratory results, administered treatments, and other relevant clinical information over time. The actions could involve decisions such as administering fluids, vasopressors, or antibiotics at different time points. The rewards are typically defined based on patient outcomes, such as survival rates, length of hospital stay, or improvement in clinical scores. Note that defining ideal sepsis resuscitation strategies is challenging due to the complex and dynamic nature of the condition, as well as the variability in patient responses to treatments, therefore it is not straightforward to define short-term rewards for each action taken.&lt;&#x2F;p&gt;
&lt;p&gt;Here are three fundamental concerns when applying reinforcement learning in healthcare:&lt;&#x2F;p&gt;
&lt;h1 id=&quot;is-the-ai-given-access-to-all-variables-that-infleunce-decision-making&quot;&gt;Is the AI given access to all variables that infleunce decision making?&lt;&#x2F;h1&gt;
&lt;p&gt;RL agent can only ook at the recorded data, however there are much more information and context that should be taken into consideration. Failing to consider all variables may result in esitmates that are confounded by spurious correlation.&lt;&#x2F;p&gt;
&lt;p&gt;For instance, severely sick septic patients may receive fluids earlier than healthier patients yet have worse outcomes, which is clearly because of them being sicker in the first place, not because of fluids worsen the outcomes. Therefore, it is important to considers of pissble confounding factors, which even more than what is required for standard prediction studies, as the sequential nature of the problem could possibly lead to confounding effects in both long term and short term.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;how-big-was-that-big-data&quot;&gt;How big was that big data?&lt;&#x2F;h1&gt;
&lt;p&gt;This is relatively straightfoward. For any AI training, its necessary to feed the model an adequate amount of useful information, so is RL model. Logically, for RL model to evaluate a new policy, it needs to find a long, continuous sequence of decisions in the historical data that matches its new policy.&lt;&#x2F;p&gt;
&lt;p&gt;In clinical trials, the mismatches between new treatment policy against historical data, also known as off-policy evaluation, the effective sample size can become small. The mismatches grow with the number of decisions in a patient&#x27;s history. For one sepsis study, a cohort of 3,855 patients yielded an effective sample size of only a few dozen. Therefore, instead of exploring new treatment approaches, observational data shall be used only to refining existing practices.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;will-the-ai-behave-prospectively-as-intended&quot;&gt;Will the AI behave prospectively as intended?&lt;&#x2F;h1&gt;
&lt;p&gt;One of the core elements in the feedback loop of every RL is rewards. However, if the design of reward function is not handled properly (e.g. error in formulation, data processing etc.), the momdel will eventually laed to poor decisions.&lt;&#x2F;p&gt;
&lt;p&gt;Often, an overly simple reward function may neglect the long-term effects. For instance, rewarding only blood pressure targets may result in agent that harms long-term benefit by dosing excessive vasopressors to patients. Additionally, the learned policy might decay after a period of time if there&#x27;s changes in the treatment standards.&lt;&#x2F;p&gt;
&lt;p&gt;Therefore, it is neccessary to use interpretable machine learning to interrogate and assess whether learned policies will behave as intended in a prospective clinincal setting&lt;&#x2F;p&gt;
&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h1&gt;
&lt;p&gt;In the end, although RL offers such promising opportunities to optimising sequential treatments in medical industry, we shall be cautious in deploying into production and requires due diligence to safely realise its potential in revolutionalising this life-saving industry.&lt;&#x2F;p&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;&amp;#x2F;img&amp;#x2F;posts&amp;#x2F;guidelines_for_reinforcement_learning_in_healthcare.jpg&quot; alt=&quot;Guidelines for Reinforcement Learning in Healthcare&quot;&gt;
  
  &lt;figcaption&gt;Guidelines for Reinforcement Learning in Healthcare&lt;&#x2F;figcaption&gt;
  
&lt;&#x2F;figure&gt;
&lt;p&gt;In the end, I would like to express my gratitude to Omer Gottesman et al. for providing such practical viewpoint in standardising the application of reinforcement learning in clinical settings.&lt;&#x2F;p&gt;
&lt;p&gt;References:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., &amp;amp; Celi, L. A. (2019). Guidelines for reinforcement learning in healthcare. Nature Medicine, 25(1), 16–18. &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;doi.org&#x2F;10.1038&#x2F;s41591-018-0310-5&quot;&gt;https:&#x2F;&#x2F;doi.org&#x2F;10.1038&#x2F;s41591-018-0310-5&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Action-value methods with incremental step size in reinforcement learning</title>
        <published>2025-10-17T00:00:00+00:00</published>
        <updated>2025-10-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/notes/action-value-methods-with-incremental-step-size/"/>
        <id>https://jienweng.github.io/notes/action-value-methods-with-incremental-step-size/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/notes/action-value-methods-with-incremental-step-size/">&lt;p&gt;This note derives the incremental update rule for action-value estimation in k-armed bandits and explains why it is preferable to recomputing full averages. The problem is memory and compute cost when rewards accumulate over time. By the end, you get a practical update equation you can implement directly in RL experiments.&lt;&#x2F;p&gt;
&lt;p&gt;Consider any &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Multi-armed_bandit&quot;&gt;k-armed bandit problem&lt;&#x2F;a&gt;, we consider each actions taken as pulling an arm of a slot machine, and the machine gives us a reward based on the action taken. We denote the action selected at time step $t$ as $A_t$, and the correspondiong reward received as $R_t$. Among the $k$ actions, the expected value of each action $a$ is denoted as $q_*(a)=\mathbb{E}[R_t|A_t=a]$, which is also known as the &lt;em&gt;value&lt;&#x2F;em&gt; of that action $a$. It is rationale to choose the action with the highest value, however in the practical applications, the value of each action is unknown. Therefore, we need to estimate the value of each action, denoted as $Q_t(a)$, read as the estimated value of action $a$ at time step $t$. The fundamental goal is to find such $Q_t(a)$ that is as close as possible to the $q_*(a)$. It is commonly known as the &lt;em&gt;true value&lt;&#x2F;em&gt; by community.&lt;&#x2F;p&gt;
&lt;p&gt;To estimate the values of each action, which we collectively call as the &lt;em&gt;action-value methods&lt;&#x2F;em&gt;, we can use the sample-average method to update the estimated value of action $a$ at time step $t$ as follows:&lt;&#x2F;p&gt;
&lt;p&gt;$$Q_{t}(a)={{\text{sum of rewards when $a$ taken prior to $t$}} \over {\text{number of times $a$ taken prior to $t$}}}.$$&lt;&#x2F;p&gt;
&lt;p&gt;This method simply takes the average of all the rewards received when action $a$ is taken prior to time step $t$. To simplify the notation, we focus on a single action. We denoted $R_i$ as the reward received the $i$-th time of this action is taken, and $n$ as the number of times action $a$ is taken prior to time step $t$. Logically, we can let $Q_n$ denote the estimate of its action value after it has been taken $n-1$ times. Therefore, we can rewrite the update rule as follows:&lt;&#x2F;p&gt;
&lt;p&gt;$$Q_n = \frac{\sum_{i=1}^{n-1}R_i}{n-1} = \frac{R_1+R_2+\ldots+R_{n-1}}{n-1}.$$&lt;&#x2F;p&gt;
&lt;p&gt;As the number of times of the action, $n$ increases, the obvious wat to update the estimate is to recalculate the average by summing up all the previous rewards and dividing it by $n-1$. However, as the number of times of the action increases to a large number, this method becomes progressively more expensive as we need to store all the previos rewards and recalculate the sum every time we need to update the estimate.&lt;&#x2F;p&gt;
&lt;p&gt;But is there a better way to update the estimate without storing all the previous rewards? The answer is yes. We devise the incremental formulas for updating the estimate by&lt;&#x2F;p&gt;
&lt;p&gt;$$
\begin{align*}
Q_{n+1} &amp;amp; = \frac{1}{n}\sum_{i=1}^{n}R_i \\
&amp;amp; = \frac{1}{n}\left(R_n + \sum_{i=1}^{n-1}R_i\right) \\
&amp;amp; = \frac{1}{n}\left(R_n + (n-1)\frac{1}{n-1}\sum_{i=1}^{n-1}R_i\right) \\
&amp;amp; = \frac{1}{n}\left(R_n + (n-1)Q_n\right) \\
&amp;amp; = \frac{1}{n}\left(R_n+nQ_n-Q_n \right) \\
&amp;amp; = Q_n + \frac{1}{n}[R_n - Q_n].
\end{align*}
$$&lt;&#x2F;p&gt;
&lt;p&gt;This incremental formula allows us to update the estimate $Q_n$ to $Q_{n+1}$ by only using the most recent reward $R_n$ and the previous estimate $Q_n$, without the need to store all the previous rewards. The term $\frac{1}{n}$ serves as the step size, which decreases as $n$ increases, ensuring that the estimate converges to the true value over time.&lt;&#x2F;p&gt;
&lt;p&gt;Even in $n=1$, we can still obtain $Q_2 = R_1$ for arbitrary initial estimate $Q_1$. In this case, the initial estimate $Q_1$ is completely ignored after the first update, as it should be. In processing the $n$th reward, the estimate is adjusted by a fraction of the error term $[R_n - Q_n]$, which is the difference between the received reward and the current estimate. This adjustment is scaled by the step size $\frac{1}{n}$, which ensures that as more data is collected, the updates become smaller, allowing the estimate to stabilize around the true value. Note that the step size here is not constant, it decreases as the number of times of the action increases.&lt;&#x2F;p&gt;
&lt;p&gt;Back to the bandit problem, the proposed simulation in for pseudo-code is as follows:&lt;&#x2F;p&gt;
&lt;details class=&quot;detail-block&quot;&gt;
  &lt;summary&gt;Bandit Problem with Incremental Step Size&lt;&#x2F;summary&gt;
  &lt;div class=&quot;detail-body&quot;&gt;
    &lt;pre class=&quot;giallo z-code&quot;&gt;&lt;code data-lang=&quot;python&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Initialize&lt;&#x2F;span&gt;&lt;span&gt; Q&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;a&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt; arbitrarily&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; for&lt;&#x2F;span&gt;&lt;span class=&quot;z-support&quot;&gt; all&lt;&#x2F;span&gt;&lt;span&gt; actions&lt;&#x2F;span&gt;&lt;span&gt; a&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;For&lt;&#x2F;span&gt;&lt;span&gt; each&lt;&#x2F;span&gt;&lt;span&gt; time&lt;&#x2F;span&gt;&lt;span&gt; step&lt;&#x2F;span&gt;&lt;span&gt; t&lt;&#x2F;span&gt;&lt;span&gt; = &lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; 2&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt; ...&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Select&lt;&#x2F;span&gt;&lt;span&gt; action&lt;&#x2F;span&gt;&lt;span&gt; A_t&lt;&#x2F;span&gt;&lt;span&gt; using&lt;&#x2F;span&gt;&lt;span&gt; a&lt;&#x2F;span&gt;&lt;span&gt; policy&lt;&#x2F;span&gt;&lt;span&gt; derived&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; from&lt;&#x2F;span&gt;&lt;span&gt; Q&lt;&#x2F;span&gt;&lt;span&gt; (&lt;&#x2F;span&gt;&lt;span&gt;e&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;g&lt;&#x2F;span&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span&gt; ε&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;-&lt;&#x2F;span&gt;&lt;span&gt;greedy&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Take&lt;&#x2F;span&gt;&lt;span&gt; action&lt;&#x2F;span&gt;&lt;span&gt; A_t&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; and&lt;&#x2F;span&gt;&lt;span&gt; observe&lt;&#x2F;span&gt;&lt;span&gt; reward&lt;&#x2F;span&gt;&lt;span&gt; R_t&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Update&lt;&#x2F;span&gt;&lt;span&gt; the&lt;&#x2F;span&gt;&lt;span&gt; estimate&lt;&#x2F;span&gt;&lt;span&gt; Q&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;A_t&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt; using&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        n&lt;&#x2F;span&gt;&lt;span&gt; = &lt;&#x2F;span&gt;&lt;span&gt;number&lt;&#x2F;span&gt;&lt;span&gt; of&lt;&#x2F;span&gt;&lt;span&gt; times&lt;&#x2F;span&gt;&lt;span&gt; action&lt;&#x2F;span&gt;&lt;span&gt; A_t&lt;&#x2F;span&gt;&lt;span&gt; has&lt;&#x2F;span&gt;&lt;span&gt; been&lt;&#x2F;span&gt;&lt;span&gt; taken&lt;&#x2F;span&gt;&lt;span&gt; prior&lt;&#x2F;span&gt;&lt;span&gt; to&lt;&#x2F;span&gt;&lt;span&gt; time&lt;&#x2F;span&gt;&lt;span&gt; t&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        Q&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;A_t&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt; = &lt;&#x2F;span&gt;&lt;span&gt;Q&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;A_t&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; +&lt;&#x2F;span&gt;&lt;span&gt; (&lt;&#x2F;span&gt;&lt;span class=&quot;z-constant&quot;&gt;1&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt;&#x2F;&lt;&#x2F;span&gt;&lt;span&gt;n&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; *&lt;&#x2F;span&gt;&lt;span&gt; [&lt;&#x2F;span&gt;&lt;span&gt;R_t&lt;&#x2F;span&gt;&lt;span class=&quot;z-keyword&quot;&gt; -&lt;&#x2F;span&gt;&lt;span&gt; Q&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span&gt;A_t&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;End&lt;&#x2F;span&gt;&lt;span&gt; For&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;details&gt;
&lt;p&gt;In the end of this post, we derived the incremental step size method for updating the aciton-value estimates which extensively applied in reinforcement learning. This method is computationally efficient as it does not require storing all previous rewards, and it ensures convergence to the true action values over time.&lt;&#x2F;p&gt;
&lt;figure class=&quot;wide&quot;&gt;
  &lt;img src=&quot;https:&amp;#x2F;&amp;#x2F;m.media-amazon.com&amp;#x2F;images&amp;#x2F;I&amp;#x2F;81EBg4xmLgL._UF1000,1000_QL80_.jpg&quot; alt=&quot;Reinforcement Learning: An Introduction&quot;&gt;
  
  &lt;figcaption&gt;Reinforcement Learning: An Introduction&lt;&#x2F;figcaption&gt;
  
&lt;&#x2F;figure&gt;
&lt;p&gt;I want to express my gratitude to Sutton and Barto for their excellent book &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;web.stanford.edu&#x2F;class&#x2F;psych209&#x2F;Readings&#x2F;SuttonBartoIPRLBook2ndEd.pdf&quot;&gt;Reinforcement Learning: An Introduction&lt;&#x2F;a&gt; that provides a comprehensive introduction to the concepts and algorithms of reinforcement learning.&lt;&#x2F;p&gt;
&lt;p&gt;References:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Sutton, R. S., &amp;amp; Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;web.stanford.edu&#x2F;class&#x2F;psych209&#x2F;Readings&#x2F;SuttonBartoIPRLBook2ndEd.pdf&quot;&gt;https:&#x2F;&#x2F;web.stanford.edu&#x2F;class&#x2F;psych209&#x2F;Readings&#x2F;SuttonBartoIPRLBook2ndEd.pdf&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>A Small Talk About Hackathons</title>
        <published>2025-08-07T00:00:00+00:00</published>
        <updated>2025-08-07T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/blog/something-about-hackathon/"/>
        <id>https://jienweng.github.io/blog/something-about-hackathon/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/blog/something-about-hackathon/">&lt;p&gt;This post is a short reflection on hackathon culture from a participant perspective: what helps teams learn fast, where teams usually waste time, and how to keep projects grounded under deadline pressure.&lt;&#x2F;p&gt;
&lt;p&gt;Along the way, I&#x27;ve met people who are far more experienced, far more knowledgeable than me. And honestly, I can&#x27;t compete with them. It&#x27;s easy to feel small in those moments.&lt;&#x2F;p&gt;
&lt;p&gt;For anyone who&#x27;s been through hackathons, you&#x27;ll know. The amount of time, energy, effort you need to commit is just INSANE. Most competitions require you to go through multiple stages: prelim round, and sometimes a semi-final, and then the final round. Every round is like a mini-marathon, endless brainstorming, last-minute changes, and almost no sleep just to push through and deliver something that works.&lt;&#x2F;p&gt;
&lt;p&gt;In the first few hackathons, I was genuinely excited. It was fun. Every time one ended, I was already looking forward to the next. But after round and round of hackathons, I started to feel the burn. Exhausted, Emptiness. YES, I still learned something new in every game, but the energy, the excitement just started to wear off.&lt;&#x2F;p&gt;
&lt;p&gt;Eventually, it all stated to feel a bit empty.&lt;&#x2F;p&gt;
&lt;p&gt;There was also this lingering thought back in my mind: &quot;Why am I comparing myself to people who are fully in this industry, who live and breathe in tech every single day, when I&#x27;m still just a student trying to explore things outside my field?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s still one more hackathon coming up, and I&#x27;ll give it what I can. but after that, I really need a break. A proper one. Not just from hackathons, but from the constant cycle of proving myself. I need to breathe, reset and work on my inner health, mentally and emotionally.&lt;&#x2F;p&gt;
&lt;p&gt;Not saying I&#x27;m quitting hackathons forever. Not at all. But I know I need some time to focus more on myself, rebuild myself, and come back stronger.&lt;&#x2F;p&gt;
&lt;p&gt;Thanks for reading until here. Sometimes, the best thing you can do for your growth is step back, realign, and then go again.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Dead Internet Theory #1</title>
        <published>2025-07-22T00:00:00+00:00</published>
        <updated>2025-07-22T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/blog/dead-internet-theory-1/"/>
        <id>https://jienweng.github.io/blog/dead-internet-theory-1/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/blog/dead-internet-theory-1/">&lt;p&gt;This post is a personal reflection on authenticity online: what feels different now, why AI-generated social content often feels hollow, and where I might be overreacting. The goal is not to claim a grand theory, but to document a concrete shift in reading experience across LinkedIn and Reddit.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;b77a432b-7713-4b10-879d-e8256d284766?format=jpeg&quot; alt=&quot;You can&amp;#39;t tell whether the experience is real or not&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;I do admit I used ChatGPT for my content in the past, but I realised that the content is not really what I’ve done before, the experience is basically “artificial”. I don’t deny AI as a productivity tool, but sometimes you just can’t tell the existence of the content written there. There’s a coldness, an emptiness that creeps in when scrolling through these posts, making it feel like shouting into a void where no real person listens.&lt;&#x2F;p&gt;
&lt;p&gt;It is really different nowadays to scroll through social media, LinkedIn, and Reddit. You don’t feel people there. It makes me really feel like the dead Internet Theory is here, and everything is full of bot activity and automatically generated content manipulated by algorithms. The authenticity of online interactions seems to be fading, they replaced by automated perfection that feels disturbingly hollow.&lt;&#x2F;p&gt;
&lt;p&gt;Chatbots or AI are really good for proofreading, but they are still only good at proofreading. Soon I think they will no longer have real content or real ideas when people are writing their experiences, their ideas; they just become full of BS nowadays. Now I even wish to find some long-ass article that is awfully organised, I find some fun to read through even though it was not good, but I can find authenticity there.&lt;&#x2F;p&gt;
&lt;p&gt;Has anyone noticed this? You could share them with me or offer me a perspective I haven’t considered before. I’d be more than happy to discuss it. I think we’ll might have another follow-up episode on this.&lt;&#x2F;p&gt;
&lt;p&gt;Share me your thoughts: &lt;a href=&quot;mailto:contact@jienweng.com&quot;&gt;contact@jienweng.com&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Making Deepseek R1 ChatBot</title>
        <published>2024-12-30T00:00:00+00:00</published>
        <updated>2024-12-30T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/blog/deepseek-r1-chatbot/"/>
        <id>https://jienweng.github.io/blog/deepseek-r1-chatbot/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/blog/deepseek-r1-chatbot/">&lt;p&gt;This post documents a small DeepSeek-R1 chatbot build: why I chose the model, what setup decisions mattered, and what worked in practice. Instead of focusing on AI industry drama, I keep the write-up centered on implementation choices and takeaways for future iterations.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.linkedin.com&#x2F;feed&#x2F;update&#x2F;urn:li:activity:7291035071992520704&#x2F;&quot;&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;d66e7a6d-8205-4e9c-ba4f-656971c79857?format=jpeg&quot; alt=&quot;“Good artists copy, great artists steal” - Steve Jobs&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;But forget the drama for a second, because the best part? Deepseek is open-source. That’s a huge win for the AI community. No more being locked behind API paywalls or waiting for some corporate overlord to decide what we can or can’t do. It’s out there, free to tinker with, and you bet I had to try it out for myself.&lt;&#x2F;p&gt;
&lt;p&gt;So, I went ahead and do something I wanted to do for soooo long -- built a chatbot. It’s not packed with fancy features (yet), but through this little experiment, I’ve discovered some pretty interesting things about how the Deepseek R1 model works. You can try it out live &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&quot;&gt;here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;btw, we won’t dive into the technical aspects just yet—that’s coming up in the next section! Stay tuned for more details on how these improvements will work behind the scenes.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-unique-thinking-approach&quot;&gt;The Unique &quot;Thinking&quot; Approach&lt;&#x2F;h3&gt;
&lt;p&gt;What blows my mind the most about this whole setup is how I managed to separate the model’s thinking process from its final response. Most chatbots out there? They just spit out an answer, and you have no idea what’s happening behind the scenes. But with this, you can actually see how the model thinks through a problem before giving an answer. It’s like watching an AI have an inner monologue, refining its thoughts before speaking. And honestly? I’ve never seen this before in any LLMs I’ve used.&lt;&#x2F;p&gt;
&lt;p&gt;At first, I didn’t even plan for this feature—it just happened while I was testing out different ways to improve response quality. I noticed that the model was generating some hidden reasoning steps before its final output. Instead of discarding them, I figured, Why not show them? And once I did, it was a game-changer. It made the AI feel so much more transparent—almost like it was thinking out loud.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&quot;&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;b65be583-5569-41e4-ab62-b4bf500120e1?format=jpeg&quot; alt=&quot;Fun Fact: The &amp;quot;thinking&amp;quot; parts are actually generated as HTML!&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;For example, if you ask it something like, “What do you think about climate change in Malaysia?”, you won’t just get a final answer out of nowhere. You’ll actually see the model go through a step-by-step breakdown of its thought process:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Breaking down the question components&lt;&#x2F;li&gt;
&lt;li&gt;Evaluating current knowledge&lt;&#x2F;li&gt;
&lt;li&gt;Forming logical connections&lt;&#x2F;li&gt;
&lt;li&gt;Synthesizing a comprehensive response&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;After seeing the model’s thinking process, what really stands out to me is how structured its response is. It doesn’t just throw out some generic take on climate change—it actually analyzes the question, breaks it down into different angles, and then builds a well-organized answer.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&quot;&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;6c9e3e2a-d318-4615-8786-7d60248fc049?format=jpeg&quot; alt=&quot;Interesting Observation: The model sometimes includes unexpected details—some accurate, some a bit off!&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;That said, while the response does sound solid, there are some oddities that make me wonder what’s going on under the hood. For example, it mentions “the subtropical Andaman and Nicobar Islands”—which, uh, aren’t even part of Malaysia. Also, “Ch bamboo” initiative? Never heard of that one. These small but noticeable mistakes show that while the model is good at structuring its answers, it still struggles with factual accuracy.&lt;&#x2F;p&gt;
&lt;p&gt;But that’s exactly what makes having a visible thought process so useful. Instead of just blindly trusting AI responses, we can now see how the model arrives at its conclusions—which means we can spot errors more easily. If it had &lt;strong&gt;hallucinated&lt;&#x2F;strong&gt; this stuff in a normal chatbot, I might not have even noticed. But because I can watch it reason through the problem, I can tell where things might be going wrong.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&quot;&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;8cf44525-7111-4a09-87c7-ee6a09d3cb3b?format=jpeg&quot; alt=&quot;AI Hallucination Meme&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This kind of transparency is what makes AI feel less like a magic black box and more like an actual tool that we can guide, correct, and refine. And that’s honestly what excites me the most about this project.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;deployment-specifications&quot;&gt;Deployment Specifications&lt;&#x2F;h3&gt;
&lt;p&gt;The chatbot is currently hosted on Hugging Face Spaces, running on a basic-tier instance, which means it’s not exactly a powerhouse but still gets the job done. Here’s what it’s running on:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 2 vCPUs&lt;&#x2F;li&gt;
&lt;li&gt;RAM: 16GB&lt;&#x2F;li&gt;
&lt;li&gt;Storage: Basic instance storage&lt;&#x2F;li&gt;
&lt;li&gt;Framework: Gradio&lt;&#x2F;li&gt;
&lt;li&gt;Inference Optimization: FP16 quantization&lt;&#x2F;li&gt;
&lt;li&gt;Average Response Time: 2-3 seconds&lt;&#x2F;li&gt;
&lt;li&gt;Concurrent Users Supported: Up to 10&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;You might notice that the &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&quot;&gt;live preview&lt;&#x2F;a&gt; here can be a bit slow while generating responses. That’s because the hardware isn’t optimized for LLM inference, so it’s working with some limitations. Hope you can bear with it! 😆&lt;&#x2F;p&gt;
&lt;p&gt;If you enjoy the project and want to see it run smoother, you can consider &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;buymeacoffee.com&#x2F;jianrong_jr&quot;&gt;sponsoring me&lt;&#x2F;a&gt;. Who knows? With enough support, I might upgrade the resources for future projects and push this even further :D&lt;&#x2F;p&gt;
&lt;h3 id=&quot;efficient-model-architecture&quot;&gt;Efficient Model Architecture&lt;&#x2F;h3&gt;
&lt;p&gt;The chatbot uses the Deepseek R1 Distilled 1.5B model, which is a significantly compressed version of the original 685B parameter model. Despite having only 1.5 billion parameters, it maintains impressive performance for many tasks.&lt;&#x2F;p&gt;
&lt;p&gt;Key points about the model:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Original model: &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;deepseek-ai&#x2F;DeepSeek-R1&quot;&gt;DeepSeek R1 (685B)&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Distilled version: &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;deepseek-ai&#x2F;DeepSeek-R1-Distill-Qwen-1.5B&quot;&gt;DeepSeek R1 Distill Qwen 1.5B&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;440x parameter reduction while maintaining core capabilities&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;open-r1&quot;&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;8eb746ad-784b-4892-9eb0-4cd26a82af13?format=jpeg&quot; alt=&quot;Model Architecture&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;impressive-benchmark-results&quot;&gt;Impressive Benchmark Results&lt;&#x2F;h3&gt;
&lt;p&gt;What’s most fascinating about this model is how well it holds up when compared to much larger models. Despite having far fewer parameters, it manages to outperform some big names in the AI world for certain tasks.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;medium.com&#x2F;data-science-in-your-pocket&#x2F;deepseek-r1-distill-qwen-1-5b-the-best-small-sized-llm-14eee304d94b&quot;&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;2ac19291-5c72-4a60-b290-bd140a61a4d4?format=jpeg&quot; alt=&quot;Model Comparison Results&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h4 id=&quot;outstanding-performance-in-key-areas&quot;&gt;Outstanding Performance in Key Areas&lt;&#x2F;h4&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;AIME 2024 (Math Competition)&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;DeepSeek R1 Distilled: 28.9% Pass@1&lt;&#x2F;li&gt;
&lt;li&gt;GPT-4o: 9.3% Pass@1&lt;&#x2F;li&gt;
&lt;li&gt;Claude 3.5: 16.0% Pass@1&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;MATH-500 (Mathematical Reasoning)&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;DeepSeek R1 Distilled: 83.9% Pass@1&lt;&#x2F;li&gt;
&lt;li&gt;GPT-4o: 74.6% Pass@1&lt;&#x2F;li&gt;
&lt;li&gt;Claude 3.5: 78.3% Pass@1&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Codeforces (Competitive Programming)&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;DeepSeek R1 Distilled: 954 Rating&lt;&#x2F;li&gt;
&lt;li&gt;GPT-4o: 759 Rating&lt;&#x2F;li&gt;
&lt;li&gt;Claude 3.5: 717 Rating&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h3 id=&quot;model-strengths-limitations&quot;&gt;Model Strengths &amp;amp; Limitations&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Strengths:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Superior reasoning capabilities, especially in mathematics&lt;&#x2F;li&gt;
&lt;li&gt;Highly efficient with only 1.5B parameters&lt;&#x2F;li&gt;
&lt;li&gt;Effective knowledge distillation from larger models&lt;&#x2F;li&gt;
&lt;li&gt;Excellent performance in zero-shot scenarios&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Limitations:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Lower performance in general coding tasks&lt;&#x2F;li&gt;
&lt;li&gt;Potential language mixing issues&lt;&#x2F;li&gt;
&lt;li&gt;Sensitivity to prompt formatting&lt;&#x2F;li&gt;
&lt;li&gt;Limited performance in broader general knowledge tasks&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This balanced perspective shows why I chose this model for my chatbot implementation - it provides exceptional reasoning capabilities while remaining lightweight enough for practical deployment.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;try-it-yourself&quot;&gt;Try It Yourself&lt;&#x2F;h3&gt;
&lt;p&gt;Due to iframe restrictions, you can access the live demo through these methods:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&quot;&gt;Direct Link to Demo&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;hub&#x2F;spaces-sdks-docker#rest-api&quot;&gt;API Documentation&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&#x2F;tree&#x2F;main&quot;&gt;Source Code&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&quot;&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;0eab3363-76da-4dab-b44b-a4a5d7cdc96f?format=jpeg&quot; alt=&quot;Chatbot interface&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;summary&quot;&gt;Summary&lt;&#x2F;h3&gt;
&lt;p&gt;Deepseek has definitely shaken things up in the AI world, and the drama surrounding it is just the tip of the iceberg. Forget the finger-pointing—this move is a win for the AI community, especially since Deepseek is open-source. No more waiting around for companies to decide how we can use AI; now it’s out there for everyone to play with and improve.&lt;&#x2F;p&gt;
&lt;p&gt;And as for my little experiment—building a chatbot with the Deepseek R1 model—it’s not feature-packed yet, but it’s definitely been a fun ride. You can try it out live &lt;a href=&quot;https:&#x2F;&#x2F;jienweng.github.io&#x2F;blog&#x2F;deepseek-r1-chatbot&#x2F;(https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2)&quot;&gt;here&lt;&#x2F;a&gt; and see how it works for yourself!&lt;&#x2F;p&gt;
&lt;h3 id=&quot;additional-resources&quot;&gt;Additional Resources&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;deepseek-ai&#x2F;DeepSeek-R1-Distill-Qwen-1.5B&quot;&gt;Model Card&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;hub&#x2F;spaces-overview&quot;&gt;Deployment Guide&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard&quot;&gt;Performance Benchmarks&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;discuss.huggingface.co&#x2F;&quot;&gt;Community Discussion&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Feel free to experiment with the live demo and share your thoughts!&lt;&#x2F;p&gt;
&lt;h3 id=&quot;references&quot;&gt;References&lt;&#x2F;h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;DeepSeek R1 (685B)&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;The original DeepSeek R1 model, a large-scale AI model with 685 billion parameters, was the precursor to the distilled 1.5B version used in the chatbot.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;deepseek-ai&#x2F;DeepSeek-R1&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;DeepSeek R1 Distill Qwen 1.5B&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;This is the distilled version of the DeepSeek R1 model, compressed to 1.5 billion parameters while retaining core capabilities.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;deepseek-ai&#x2F;DeepSeek-R1-Distill-Qwen-1.5B&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Open R1 Model Architecture&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;Explore the detailed architecture of the DeepSeek R1 model, showcasing its design and structure.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;open-r1&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Medium - Deepseek R1 Distill Qwen 1.5B Performance&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;A comparison of the performance between Deepseek R1 Distilled and other models, showing its impressive results in multiple domains.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;medium.com&#x2F;data-science-in-your-pocket&#x2F;deepseek-r1-distill-qwen-1-5b-the-best-small-sized-llm-14eee304d94b&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hugging Face Space - Chatbot Demo&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;Live demo of the Deepseek R1 chatbot that showcases the model’s response and reasoning capabilities.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hugging Face - API Documentation&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;Official API documentation for Hugging Face Spaces, helping developers interact with models and integrate them into applications.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;hub&#x2F;spaces-sdks-docker#rest-api&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hugging Face - Source Code&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;Direct access to the source code of the Deepseek R1 chatbot project on Hugging Face Spaces for those interested in contributing or learning.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jienweng&#x2F;chatbot_v2&#x2F;tree&#x2F;main&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hugging Face - Model Card&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;Official card for the Deepseek R1 Distilled model, providing details on its functionality and training specifications.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;deepseek-ai&#x2F;DeepSeek-R1-Distill-Qwen-1.5B&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hugging Face - Deployment Guide&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;Guidelines for deploying models and applications using Hugging Face Spaces.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;hub&#x2F;spaces-overview&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hugging Face - Performance Benchmarks&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;An overview of the model performance across various tasks and benchmarks, showcasing the strengths and weaknesses of different models.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hugging Face - Community Discussion&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;em&gt;Join the community discussions on Hugging Face, where users can ask questions, share insights, and discuss AI-related topics.&lt;&#x2F;em&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;discuss.huggingface.co&#x2F;&quot;&gt;Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Eco Finance: A Sustainable Future Prototype</title>
        <published>2024-12-29T00:00:00+00:00</published>
        <updated>2024-12-29T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/blog/eco-finance/"/>
        <id>https://jienweng.github.io/blog/eco-finance/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/blog/eco-finance/">&lt;p&gt;This post focuses on the Eco Finance prototype itself: the problem we targeted, the product concept, and what we learned from turning an idea into a demo under hackathon constraints. It is written as a project debrief rather than an event recap.&lt;&#x2F;p&gt;
&lt;p&gt;We participated in PayHack 2024, quite a big hackathon event with many talented individuals.&lt;&#x2F;p&gt;
&lt;p&gt;Our project, Eco Finance, focuses on the implementation of a carbon tax expected to be released in 2026. Although it initially targets specific industries, it is crucial and trending to implement this for everyone in Malaysia. Read more about the 2026 carbon tax here.&lt;&#x2F;p&gt;
&lt;p&gt;European countries are already implementing ESG-centric policies in various industries, such as the automobile industry. These are just preliminary steps; now we want to delve deeper into the subject.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;payhack-2024.vercel.app&#x2F;&quot;&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;3034bd0a-9dae-4078-80a9-a641c747b58a?format=jpeg&quot; alt=&quot;Eco Finance Interface&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;understanding-the-idea&quot;&gt;Understanding the idea&lt;&#x2F;h2&gt;
&lt;p&gt;Do you know how much carbon footprint you generate from ordering a Shopee parcel? Or how much carbon you generate by driving to work instead of taking public transport? It&#x27;s challenging for people to visualize their footprint. Hence, we&#x27;re here to make this visible to people, increasing their awareness about this issue. According to Visa, 80% of Malaysians are aware of the environmental impact of consumption. With the release of Malaysia&#x27;s largest payment gateway provider and the latest project linking banks.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-it-works&quot;&gt;How it works&lt;&#x2F;h2&gt;
&lt;p&gt;The OpenFinance API can seamlessly integrate people&#x27;s transaction details, allowing us to aggregate a person&#x27;s transactions and their carbon footprints. The calculation would be the emission factor times the amount, giving us the carbon footprint from those transactions.&lt;&#x2F;p&gt;
&lt;p&gt;For example, if the emission factor for a specific merchant category is 0.5 kg CO2 per RM, and a person spends RM 100, the carbon footprint would be 0.5 kg CO2&#x2F;RM * 100 RM = 50 kg CO2.&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s an established merchant category code (MCC) in transaction details, where we only need to fine-tune and investigate the actual emission factors for each merchant code. This could be done by collaborating with the Department of Statistics Malaysia (DOSM) to conduct surveys and research among Malaysians. In this project, we are using dummy variables based on this ideology only.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;monetizing-the-ecosystem&quot;&gt;Monetizing the Ecosystem&lt;&#x2F;h2&gt;
&lt;p&gt;We circulate this whole ecosystem by monetizing it. We extract carbon credits from them using the formula from WORLDMETER, where the baseline of average carbon emission per person in Malaysia is 8 tons per year. With that, we can pool up carbon credits, extracted from the surplus from the calculation. There&#x27;s a proven market potential with Bursa Carbon Exchange (BCX), established on 9 Dec 2022, which is available to trade carbon credits in Malaysia. Then we sell the carbon credits to major companies in Malaysia like Petronas and Maybank to help them offset their carbon credits. &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.allenandgledhill.com&#x2F;perspectives&#x2F;publications&#x2F;bulletins-malaysia&#x2F;2023&#x2F;bursa-malaysia-launches-voluntary-carbon-market-exchange&#x2F;&quot;&gt;Read more about BCX here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;YouTube id=&quot;1QKwHFVsEXE&quot; &#x2F;&gt;
&lt;p&gt;From &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.petronas.com&#x2F;sustainability&#x2F;delivering-net-zero&quot;&gt;Petronas&lt;&#x2F;a&gt;, it&#x27;s evident that their future plan aims for net-zero carbon emission by 2050. This proves there is a market in Malaysia, and more people will enter the market and participate.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;encouraging-eco-friendly-practices&quot;&gt;Encouraging Eco-Friendly Practices&lt;&#x2F;h2&gt;
&lt;p&gt;How do we encourage eco-friendly spending habits in Malaysia? We aim to attract more people, even those who are not initially interested in eco-friendly practices, to join us. We can make Malaysia greener and more sustainable, at least in the sense of ESG. We choose to reward users with healthy spending habits in terms of eco-friendly spending and reward them with cash for being environmentally friendly.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;f54512bd-f014-43ac-a1b1-39628b5990d7?format=jpeg&quot; alt=&quot;The whole business plan of eco-finance&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;sustainable-business-model&quot;&gt;Sustainable Business Model&lt;&#x2F;h2&gt;
&lt;p&gt;We can summarize the business circulation here and make it sustainable as well, where we can become self-sustaining:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Collect carbon credits from users&lt;&#x2F;li&gt;
&lt;li&gt;Pool up carbon credits&lt;&#x2F;li&gt;
&lt;li&gt;Certify carbon credits with Bursa Carbon Exchange (BCX)&lt;&#x2F;li&gt;
&lt;li&gt;Sell carbon credits to companies who need them&lt;&#x2F;li&gt;
&lt;li&gt;Reward users to encourage eco-friendly spending habits.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Basically that&#x27;s it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;project-random-thingy&quot;&gt;Project Random Thingy&lt;&#x2F;h2&gt;
&lt;p&gt;Now, it&#x27;s about the random thingys of the project. We have the project hosted at this link. Although it is not fully complete, it serves as a prototype and is yet to be an MVP. Considering we had only 24 hours to complete this from idea to execution, I mean, it&#x27;s good for a first-timer. Right...?&lt;&#x2F;p&gt;
&lt;p&gt;You can access the hosted project prototype &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;payhack-2024.vercel.app&#x2F;&quot;&gt;here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Feel free to browse through it. If you have any questions, please email me, and I&#x27;ll personally explain it to you. You can also see the admin page by accessing &lt;a rel=&quot;noopener nofollow noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;payhack-2024.vercel.app&#x2F;admin&quot;&gt;here&lt;&#x2F;a&gt; or by changing the &lt;code&gt;&#x2F;dashboard&lt;&#x2F;code&gt; to &lt;code&gt;&#x2F;admin&lt;&#x2F;code&gt;. It looks something like this:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;7ee8a2c3-551c-4726-9551-f7d6ab743391?format=jpeg&quot; alt=&quot;Admin Dashboard&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Feel free to browse through it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wrap-up&quot;&gt;Wrap-Up&lt;&#x2F;h2&gt;
&lt;p&gt;In conclusion, Eco Finance aims to make carbon footprints visible to individuals, encouraging eco-friendly spending habits and contributing to a greener Malaysia. By monetizing carbon credits and rewarding users, we create a sustainable ecosystem that benefits both the environment and the economy. It&#x27;s really kesian that we couldn&#x27;t make it to final though T.T&lt;&#x2F;p&gt;
&lt;p&gt;Hope you guys like it :D&lt;&#x2F;p&gt;
&lt;p&gt;Oh btw! Once again.. You can read the full story &lt;a href=&quot;&#x2F;posts&#x2F;first-hackathon-experience&quot;&gt;here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>First Physical Hackathon Experience</title>
        <published>2024-12-02T00:00:00+00:00</published>
        <updated>2024-12-02T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://jienweng.github.io/blog/first-hackathon-experience/"/>
        <id>https://jienweng.github.io/blog/first-hackathon-experience/</id>
        
        <content type="html" xml:base="https://jienweng.github.io/blog/first-hackathon-experience/">&lt;p&gt;This post records what our first physical hackathon taught us as a math-heavy team entering a software-first environment. I focus on concrete lessons from ideation, mentoring, and pitching that we can reuse in future competitions.&lt;&#x2F;p&gt;
&lt;p&gt;I found myself staring at my phone, thumb hovering over the share button. &quot;Am I really qualified for this?&quot; I thought to myself. &quot;What if we make fools of ourselves?&quot; The doubts crept in like unwanted guests. But then another voice, stronger and more determined, pushed back: &quot;When else will we get a chance like this? We might not be coders, but we know how to solve problems. Isn&#x27;t that what hackathons are really about?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;I reached out to my fellow mathematics coursemates: Janice, Roius, and Gwyn. &quot;Hey, want to do something crazy?&quot; I asked, half expecting them to laugh it off. To my surprise, their responses came quickly, filled with enthusiasm despite (or maybe because of) our collective inexperience. Only 2 of us had ever participated in a hackathon before, and Gwyn had never even written a line of code. But there we were, four mathematics students from UTAR, signing up for one of the most competitive hackathons in the country.&lt;&#x2F;p&gt;
&lt;p&gt;The looks we got when we arrived were priceless. &quot;Are you guys from Computer Science?&quot; someone asked, eyeing our team with curiosity. We exchanged glances and grinned. &quot;Nope, we&#x27;re Math students, haha...&quot; The mixture of surprise and skepticism on their faces was something I&#x27;ll never forget. In those moments, our outsider status felt both terrifying and weirdly empowering.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;25b6ad42-8235-45ae-9505-a2c296a8ca2a?format=jpeg&quot; alt=&quot;Our first breakfast together at PayHack 2024 - nervous but excited! The calm before the storm.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;November 30th marked our first day, and it was a blur of ideation and learning. We came up with an innovative idea: creating an app to visualize transaction carbon footprints, with the ability to pool and trade carbon footprint surpluses with companies in need. On paper, it sounded promising—a perfect blend of fintech and environmental consciousness.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;b16e0c66-cefb-4cc6-8bf6-de5a087cc513?format=jpeg&quot; alt=&quot;Getting grilled during our mentoring session - each question pushed us to think deeper about our solution&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Then came the intense mentoring sessions with Johan Nasir. He didn&#x27;t hold back. &quot;What happens if the carbon credits are manipulated?&quot; he&#x27;d challenge. &quot;How do you ensure the authenticity of the footprint data?&quot; Another round of rethinking. Each session felt like an intense rotan session—tough love at its finest. He&#x27;d poke holes in our solutions, push us to think deeper, and force us to confront real-world problems that actually needed solving.&lt;&#x2F;p&gt;
&lt;p&gt;The week before the hackathon was a rollercoaster. Competing against almost 100 teams from across Malaysia, we were shocked to make it to the Top 32. When we saw Johan&#x27;s name among our judges and received the news of our advancement, our excitement was through the roof. But reality quickly set in—we had a major problem. None of us had real web development experience. My knowledge was limited to Python, SQL, and vanilla HTML&#x2F;CSS&#x2F;JavaScript.&lt;&#x2F;p&gt;
&lt;p&gt;With just four days between Tuesday and Friday, we had to make crucial technical decisions while juggling our internships. After intense research, we settled on Vue.js and Flask. Janice even traveled all the way from JB to KL for this. Every evening after our internships, we&#x27;d dive into tutorials, trying to absorb as much as we could about our chosen tech stack.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;0d444922-77c4-4d91-b645-fb96b7fd5d17?format=jpeg&quot; alt=&quot;3 AM and still debugging - running on determination and coffee&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The hackathon itself was intense. When exhaustion hit, we took turns napping wherever we could. I couldn&#x27;t get a tent, so I made do with a random bench for quick 4-hour power naps before jumping back into coding. Everything seemed to be going smoothly until the morning of the submission. At 8:30 AM, just after breakfast and 90 minutes before the deadline, our backend crashed—the information couldn&#x27;t be parsed properly. In a desperate move, we had to hardcode some components just to make the submission deadline at 10 AM.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;0784bdc4-dc31-4b2b-9d5a-2e04677a9ba1?format=jpeg&quot; alt=&quot;Presenting our final product: MouManTai - tracking and trading carbon footprints from financial transactions&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;While we didn&#x27;t make it to the Top 10, watching the final pitches was an eye-opening experience. The winning teams showcased solutions that were not just technically impressive but also deeply thoughtful about real-world implementation. One team&#x27;s blockchain-based remittance system particularly stood out - their attention to regulatory compliance and market research was incredible. &quot;We should have done more market validation,&quot; I thought to myself. &quot;Next time, we need to focus not just on the technical solution but on the whole business case.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;The top teams also demonstrated masterful presentation skills. Their pitches weren&#x27;t just about features - they told compelling stories about why their solutions mattered. Each slide was carefully crafted, each demo was flawlessly executed, and their responses to judges&#x27; questions showed deep understanding of both technical and business aspects. I made mental notes: &quot;Practice the pitch more. Know your numbers. Be ready for any question.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Despite not making the finals, I did win a Samsung monitor in the lucky draw, a small consolation that brought some laughs to our tired team.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;3d119ab7-d3bf-4632-8675-6fbaf1963c08?format=jpeg&quot; alt=&quot;A silver lining - winning a Samsung monitor in the lucky draw!&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Looking back now, the sleepless nights and endless debugging sessions blur together, but certain moments stand crystal clear: the late-night breakthrough when our first feature finally worked, the proud smile on Johan&#x27;s face during our final presentation, and most importantly, the unshakeable bond formed between four mathematicians who dared to dream.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;cdn.cosmos.so&#x2F;266c9030-58ad-4536-9e60-c88edb6df8a9?format=jpeg&quot; alt=&quot;Jien Weng, Janice, Gwyn and Roius&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;To Janice, Roius, and Gwyn: thank you for taking this leap of faith with me. For believing that our mathematical minds could contribute something meaningful to the tech world. To Johan: your guidance went beyond mentorship—you showed us that innovation comes from daring to be different. And to PayNet and JomHack: thank you for creating a space where even mathematics students could discover their potential in technology.&lt;&#x2F;p&gt;
&lt;p&gt;They say the best stories come from stepping out of your comfort zone. Well, we didn&#x27;t just step—we took a giant leap. And while our first hackathon journey has ended, something tells me this is just the beginning of our adventure in the tech world. The equations and formulas we&#x27;ve studied for years are no longer just abstract concepts—they&#x27;re tools waiting to be applied in the vast playground of technology.&lt;&#x2F;p&gt;
&lt;p&gt;After all, who says mathematicians can&#x27;t be hackers too? Sometimes the best innovations come from those who dare to cross the boundaries between disciplines, who bring fresh perspectives to old problems. And maybe, just maybe, that&#x27;s exactly what the tech world needs more of.&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
