Simple Linear Regression

2026-01-02T00:00:00+00:00

This note introduces simple linear regression from first principles and focuses on how slope and intercept are estimated from data. The problem it solves is modeling a linear relationship between one predictor and one response. It is intended as the base layer for the later regression notes in this series.</p>

In secondary school, we learn that the equation of a straight line is given by $y = mx + c$, where $m$ is the slope and $c$ is the y-intercept. In statistics and machine learning, we use a similar but more general form to model the relationship between a dependent variable $y$ and an independent variable $x$. This is known as simple linear regression.</p>

The Model</h2>
In simple linear regression, we express the relationship as:</p>
$$\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i,$$</p>
where $\hat{y}_i$ is the predicted value of the dependent variable for the $i$-th observation, $\hat{\beta}_0$ is the estimated y-intercept, and $\hat{\beta}_1$ is the estimated slope coefficient. The "hat" notation indicates that these are estimates derived from data, not the true (unknown) population parameters $\beta_0$ and $\beta_1$.</p>
The true model is assumed to be:</p>
$$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,$$</p>
where $\varepsilon_i$ represents the random error term for the $i$-th observation.</p>

Ordinary Least Squares (OLS)</h2>
The question is: how do we find the best estimates $\hat{\beta}_0$ and $\hat{\beta}_1$? We need a systematic method that determines the line of best fit. The most common approach is Ordinary Least Squares (OLS).</p>
OLS minimizes the sum of the squared differences between the observed values $y_i$ and the predicted values $\hat{y}_i$. These differences are called residuals, defined as $e_i = y_i - \hat{y}_i$. By squaring the residuals, we treat positive and negative errors equally and penalize larger deviations more heavily.</p>
The objective function is the Sum of Squared Errors (SSE):</p>
$$SSE = \sum^n_{i=1}(y_i - \hat{y}_i)^2 = \sum^n_{i=1}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2.$$</p>

Deriving the Normal Equations</h2>
To minimize the SSE, we take partial derivatives with respect to $\hat{\beta}_0$ and $\hat{\beta}_1$ and set them equal to zero.</p>

Partial derivative with respect to $\hat{\beta}_0$</h3>
$$\frac{\partial SSE}{\partial \hat{\beta}_0} = -2\sum^n_{i=1}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0.$$</p>
Dividing both sides by $-2$ and expanding the sum:</p>
$$\sum^n_{i=1} y_i - n\hat{\beta}_0 - \hat{\beta}_1 \sum^n_{i=1} x_i = 0.$$</p>
Solving for $\hat{\beta}_0$:</p>
$$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x},$$</p>
where $\bar{x} = \frac{1}{n}\sum^n_{i=1}x_i$ and $\bar{y} = \frac{1}{n}\sum^n_{i=1}y_i$ are the sample means.</p>

Partial derivative with respect to $\hat{\beta}_1$</h3>
$$\frac{\partial SSE}{\partial \hat{\beta}_1} = -2\sum^n_{i=1}x_i(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0.$$</p>
Dividing by $-2$ and expanding:</p>
$$\sum^n_{i=1} x_i y_i - \hat{\beta}_0 \sum^n_{i=1} x_i - \hat{\beta}_1 \sum^n_{i=1} x_i^2 = 0.$$</p>
Substituting $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$:</p>
$$\sum^n_{i=1} x_i y_i - (\bar{y} - \hat{\beta}_1 \bar{x})\sum^n_{i=1} x_i - \hat{\beta}_1 \sum^n_{i=1} x_i^2 = 0.$$</p>
After simplification, we obtain:</p>
$$\hat{\beta}_1 = \frac{\sum^n_{i=1}(x_i - \bar{x})(y_i - \bar{y})}{\sum^n_{i=1}(x_i - \bar{x})^2}.$$</p>

Introducing $S_{xx}$ and $S_{xy}$ Notation</h2>
To write the estimators more concisely, we define the following summary statistics:</p>
$$S_{xx} = \sum^n_{i=1}(x_i - \bar{x})^2 = \sum^n_{i=1}x_i^2 - n\bar{x}^2,$$</p>
$$S_{xy} = \sum^n_{i=1}(x_i - \bar{x})(y_i - \bar{y}) = \sum^n_{i=1}x_i y_i - n\bar{x}\bar{y}.$$</p>
Using this notation, the OLS estimators become:</p>
$$\hat{\beta}_1 = \frac{S_{xy}}{S_{xx}},$$</p>
$$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}.$$</p>
These are elegant expressions that reveal the structure of the estimates. The slope $\hat{\beta}_1$ is the ratio of the joint variability of $x$ and $y$ (captured by $S_{xy}$) to the variability of $x$ alone (captured by $S_{xx}$). The intercept $\hat{\beta}_0$ ensures the regression line passes through the point $(\bar{x}, \bar{y})$.</p>

Residuals and Fitted Values</h2>
Once we have $\hat{\beta}_0$ and $\hat{\beta}_1$, we can compute:</p>

Fitted values</strong>: $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$ for each observation.</li>
Residuals</strong>: $e_i = y_i - \hat{y}_i$ for each observation.</li> </ul>
Two important properties of OLS residuals are worth noting:</p>

The residuals sum to zero: $\sum^n_{i=1} e_i = 0$.</li>
The residuals are uncorrelated with the fitted values: $\sum^n_{i=1} e_i \hat{y}_i = 0$.</li> </ol>
These properties follow directly from the normal equations.</p>
Key Assumptions</h2>
For OLS to produce reliable estimates, the following assumptions are typically required:</p>

Linearity</strong>: The relationship between $x$ and $y$ is linear in the parameters.</li>
Independence</strong>: The observations are independent of one another.</li>
Homoscedasticity</strong>: The variance of the error terms $\varepsilon_i$ is constant across all values of $x$.</li>
Normality</strong>: The error terms are normally distributed, that is, $\varepsilon_i \sim N(0, \sigma^2)$.</li> </ol>
When these assumptions hold, OLS produces the Best Linear Unbiased Estimators (BLUE) according to the Gauss-Markov theorem.</p>
Summary</h2>
In this post, we covered the fundamentals of simple linear regression:</p>

The model $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$ describes the estimated linear relationship between $x$ and $y$.</li>
OLS minimizes the sum of squared errors to find the best-fitting line.</li>
The estimators $\hat{\beta}1 = S</em>{xy}/S_{xx}$ and $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$ are derived from the normal equations.</li>
The $S_{xx}$ and $S_{xy}$ notation provides a compact way to express these results.</li> </ul>
In the next post, we will extend this framework to handle multiple independent variables through multiple linear regression.</p>

Gradient Descent Algorithm Explained

2025-11-30T00:00:00+00:00

This note explains gradient descent as an optimization procedure for minimizing differentiable objectives and clarifies the role of the learning rate in convergence behavior. The practical question is not only how the rule works, but when it becomes unstable. The walkthrough focuses on the core math and interpretable examples.</p>

Mathematics Technical Parts</h2>

Consider a differentiable function $f: \mathbb{R}^n \to \mathbb{R}$. The gradient of $f$ at a point $x \in \mathbb{R}^n$ is denoted as $\nabla f(x)$, which is a vector of partial derivatives. The gradient points in the direction of the steepest ascent of the function. To find a local minimum, we need to move in the opposite direction of the gradient. Hence, the exploits this by moving in the opposite direction:</p>

$$x_{k+1} = x_k - \alpha \nabla f(x_k),$$</p>

where:</p>

$x_k$ is the current point,</li>
$\alpha$ is the learning rate, where $\alpha > 0$,</li>

$\nabla f(x_k)$ is the gradient of $f$ at point $x_k$.</li> </ul>

We simulate two different scenarios, one with postive gradient in loss function, and another with negative gradient in loss function.</p>

Scenario 1: Positive Gradient</h3>

Let's consider a simple quadratic function:</p>

$$f(x) = x^2 + 4x + 4.$$</p>

The gradient of this function is:</p>

$$\nabla f(x) = 2x + 4.$$</p>

Starting from an initial point, say $x_0 = 0$, and choosing a learning rate $\alpha = 0.1$, we can apply the gradient descent update rule iteratively:</p>

Compute the gradient at the current point: $\nabla f(x_0) = 2(0) + 4 = 4$.</li>
Update the point: $x_1 = x_0 - 0.1 \cdot 4 = 0 - 0.4 = -0.4$.</li>

Repeat the process for a number of iterations.</li> </ol>

For the first 5 iteration, we can tabulate the results as follows:</p>

Iteration ($k$)</th> Current Point ($x_k$)</th> Gradient ($\nabla f(x_k)$)</th> Updated Point ($x_{k+1}$)</th></tr></thead>

0</td> 0.0</td> 4.0</td> -0.4</td></tr>

1</td> -0.4</td> 3.2</td> -0.72</td></tr>

2</td> -0.72</td> 2.56</td> -0.976</td></tr>

3</td> -0.976</td> 2.048</td> -1.1808</td></tr>

4</td>

-1.1808</td>

1.6384</td>

-1.34464</td></tr> </tbody></table>

We can observe that the gradient os loss function from positive value is moving backward each time step, and the first step size is larger but it gradually decreases as we approach the minimum point. This is the brilliant part of gradient descent, as it automatically take larger steps when we are far from the minimum and smaller steps as we get closer to the minimum.</p>

We can visualize the process using a simple plot:</p>

import</span> numpy</span> as</span> np</span></span>
import</span> matplotlib</span>.</span>pyplot</span> as</span> plt</span></span>
</span>
#</span> Define the function and its gradient</span></span>
def</span> f</span>(</span>x</span>)</span>:</span></span>
    return</span> x</span>**</span>2</span> +</span> 4</span>*</span>x</span> +</span> 4</span></span>
def</span> grad_f</span>(</span>x</span>)</span>:</span></span>
    return</span> 2</span>*</span>x</span> +</span> 4</span></span>
</span>
#</span> Gradient Descent parameters</span></span>
alpha</span> =</span> 0.1</span></span>
x0</span> =</span> 0</span></span>
iterations</span> =</span> 20</span></span>
</span>
#</span> Store the points</span></span>
x_points</span> =</span> [</span>x0</span>]</span></span>
for</span> _</span> in</span> range</span>(</span>iterations</span>)</span>:</span></span>
    grad</span> =</span> grad_f</span>(</span>x_points</span>[</span>-</span>1</span>]</span>)</span></span>
    x_new</span> =</span> x_points</span>[</span>-</span>1</span>]</span> -</span> alpha</span> *</span> grad</span></span>
    x_points</span>.</span>append</span>(</span>x_new</span>)</span></span>
</span>
#</span> Plotting</span></span>
x</span> =</span> np</span>.</span>linspace</span>(</span>-</span>5</span>,</span> 1</span>,</span> 100</span>)</span></span>
y</span> =</span> f</span>(</span>x</span>)</span></span>
plt</span>.</span>plot</span>(</span>x</span>,</span> y</span>,</span> label</span>=</span>'</span>f(x) = x^2 + 4x + 4</span>'</span>)</span></span>
plt</span>.</span>scatter</span>(</span>x_points</span>,</span> f</span>(</span>np</span>.</span>array</span>(</span>x_points</span>)</span>)</span>,</span> color</span>=</span>'</span>red</span>'</span>)</span></span>
plt</span>.</span>plot</span>(</span>x_points</span>,</span> f</span>(</span>np</span>.</span>array</span>(</span>x_points</span>)</span>)</span>,</span> color</span>=</span>'</span>red</span>'</span>,</span> linestyle</span>=</span>'</span>--</span>'</span>,</span> label</span>=</span>'</span>Gradient Descent Path</span>'</span>)</span></span>
plt</span>.</span>title</span>(</span>'</span>Gradient Descent on f(x)</span>'</span>)</span></span>
plt</span>.</span>xlabel</span>(</span>'</span>x</span>'</span>)</span></span>
plt</span>.</span>ylabel</span>(</span>'</span>f(x)</span>'</span>)</span></span>
plt</span>.</span>legend</span>(</span>)</span></span>
plt</span>.</span>grid</span>(</span>)</span></span>
plt</span>.</span>show</span>(</span>)</span></span></code></pre>
After 20 iterations, the points converge towards the minimum point at $x = -2$. We can see how the points move along the curve of the function, gradually approaching the minimum.</p>
</p>

Scenario 2: Negative Gradient</h3>
Now, let's consider a function with a negative gradient:
$$f(x) = -x^3 + 4x^2 - 4.$$</p>
The gradient of this function is:
$$\nabla f(x) = -3x^2 + 8x.$$</p>
Similar to what we have done on the previous example, we start from an initial point, say $x_0 = 0$, and choosing a learning rate $\alpha = 0.01$, we can apply the gradient descent update rule iteratively:</p>

Compute the gradient at the current point: $\nabla f(x_0) = -3(0)^2 + 8(0) = 0$.</li>
Update the point: $x_1 = x_0 - 0.01 \cdot 0 = 0$.</li>
Repeat the process for a number of iterations.</li>
</ol>
For the first 5 iterations, we can tabulate the results as follows:</p>
Iteration ($k$)</th> Current Point ($x_k$)</th> Gradient ($\nabla f(x_k)$)</th> Updated Point ($x_{k+1}$)</th></tr></thead>

0</td> 1.0000</td> 5.0000</td> 0.9500</td></tr>
1</td> 0.9500</td> 4.7175</td> 0.9028</td></tr>
2</td> 0.9028</td> 4.4533</td> 0.8583</td></tr>
3</td> 0.8583</td> 4.2057</td> 0.8162</td></tr>
4</td> 0.8162</td> 3.9734</td> 0.7765</td></tr>
</tbody></table>
In this scenario, we can observe that the gradient of loss function from negative value is moving forward each time step, and the first step size is larger but it gradually decreases as we approach the minimum point. Similar to the previous scenario, gradient descent automatically adjusts the step size based on the distance from the minimum. Similarly, we can visualize the process using a simple plot:</p>
import</span> numpy</span> as</span> np</span></span>
import</span> matplotlib</span>.</span>pyplot</span> as</span> plt</span></span>
</span>
#</span> Define the function and its gradient</span></span>
def</span> f</span>(</span>x</span>)</span>:</span></span>
    return</span> -</span>x</span>**</span>3</span> +</span> 4</span>*</span>x</span>**</span>2</span> -</span> 4</span></span>
def</span> grad_f</span>(</span>x</span>)</span>:</span></span>
    return</span> -</span>3</span>*</span>x</span>**</span>2</span> +</span> 8</span>*</span>x</span></span>
</span>
#</span> Gradient Descent parameters</span></span>
alpha</span> =</span> 0.01</span></span>
x0</span> =</span> 1</span></span>
iterations</span> =</span> 40</span></span>
</span>
#</span> Store the points</span></span>
x_points</span> =</span> [</span>x0</span>]</span></span>
for</span> _</span> in</span> range</span>(</span>iterations</span>)</span>:</span></span>
    grad</span> =</span> grad_f</span>(</span>x_points</span>[</span>-</span>1</span>]</span>)</span></span>
    x_new</span> =</span> x_points</span>[</span>-</span>1</span>]</span> -</span> alpha</span> *</span> grad</span></span>
    x_points</span>.</span>append</span>(</span>x_new</span>)</span></span>
</span>
#</span> Plotting</span></span>
x</span> =</span> np</span>.</span>linspace</span>(</span>-</span>1</span>,</span> 3</span>,</span> 100</span>)</span></span>
y</span> =</span> f</span>(</span>x</span>)</span></span>
plt</span>.</span>plot</span>(</span>x</span>,</span> y</span>,</span> label</span>=</span>'</span>f(x) = -x^3 + 4x^2 - 4</span>'</span>)</span></span>
plt</span>.</span>scatter</span>(</span>x_points</span>,</span> f</span>(</span>np</span>.</span>array</span>(</span>x_points</span>)</span>)</span>,</span> color</span>=</span>'</span>red</span>'</span>)</span></span>
plt</span>.</span>plot</span>(</span>x_points</span>,</span> f</span>(</span>np</span>.</span>array</span>(</span>x_points</span>)</span>)</span>,</span> color</span>=</span>'</span>red</span>'</span>,</span> linestyle</span>=</span>'</span>--</span>'</span>,</span> label</span>=</span>'</span>Gradient Descent Path</span>'</span>)</span></span>
plt</span>.</span>title</span>(</span>'</span>Gradient Descent on f(x)</span>'</span>)</span></span>
plt</span>.</span>xlabel</span>(</span>'</span>x</span>'</span>)</span></span>
plt</span>.</span>ylabel</span>(</span>'</span>f(x)</span>'</span>)</span></span>
plt</span>.</span>legend</span>(</span>)</span></span>
plt</span>.</span>grid</span>(</span>)</span></span>
plt</span>.</span>show</span>(</span>)</span></span></code></pre>
After 20 iterations, the points converge towards the minimum point at approximately $x = 0$. We can see how the points move along the curve of the function, gradually approaching the minimum. We plotted the gradient descent path on the function curve:</p>
</p>

Proof of Convergence</h2>
To prove the convergence of the gradient descent algorithm, we need to show that the sequence of points generated by the algorithm converges to a local minimum of the function $f(x)$. We assume that $f$ is a convex function with Lipschitz continuous gradients, meaning there exists a constant $L > 0$ such that for all $x, y \in \mathbb{R}^n$,
$$|\nabla f(x) - \nabla f(y)| \leq L |x - y|.$$</p>
Under convexity alone, we show that the rate of convergence is sublinear. Specifically, we can show that after $k$ iterations, the function value satisfies:</p>
$$
f(x_k) - f(x^*) \leq \frac{L |x_0-x^*|^2 }{2k},
$$</p>
where $x^*$ is the global minimum point of $f$. This indicates that as the number of iterations $k$ increases, the function value approaches the minimum value at a rate inversely proportional to $k$.</p>
This completes the proof of convergence for the gradient descent algorithm under the assumptions of convexity and Lipschitz continuous gradients. The algorithm effectively finds a local minimum of the function $f(x)$ by iteratively updating the points in the direction of the steepest descent.</p>
But this is completely tied to the choice of learning rate $\alpha$. If $\alpha$ is too large, the algorithm may overshoot the minimum and diverge. If $\alpha$ is too small, the convergence will be very slow. Therefore, choosing an appropriate learning rate is crucial for the success of the gradient descent algorithm. They are various techniques to adaptively adjust the learning rate during the optimization process, such as learning rate schedules and adaptive optimization algorithms like Adam and RMSprop, which we can explore in future posts.</p>

Known Issues of Gradient Descent</h2>
While gradient descent is a powerful optimization algorithm, it does have some known issues. In multivariate functions, the presence of saddle points can affect the convergence. Saddle points are points where the gradient is zero, but they are neither local minima nor local maxima. In high-dimensional spaces, saddle points are more prevalent than local minima, and gradient descent can get stuck at these points, leading to slow convergence or failure to find the global minimum. A popular example is the function $f(x, y) = x^2 - y^2$, which has a saddle point at $(0, 0)$. The direction vector at this point is zero, and gradient descent may struggle to escape this point.</p>

  
  
  Gradient Descent Saddle Point</figcaption>
  
</figure>
To mitigate the issues with saddle points, various techniques can be employed, such as adding noise to the gradients, using momentum-based methods, or employing second-order optimization methods that consider the curvature of the function, which we can explore in future discussions. But overall, gradient descent remains a fundamental and widely used optimization algorithm in machine learning and various other fields.</p>

Conclusion</h2>
Despite its simplicity, gradient descent is a powerful optimization algorithm that forms the backbone of many machine learning algorithms. By iteratively updating the parameters in the direction of the steepest descent, gradient descent effectively finds local minima of differentiable functions. Understanding its mathematical foundations and practical implementations is crucial for anyone working in the field of machine learning and optimization.</p>
It's remarkable how such a simple iterative process can lead optimize almost any complex function in real life applications. I truly cannot appreciate enough the beauty of this elegant mathematical concept.</p>
For those who are interested to learn more about gradient descent, I highly recommend watching the following video by StatQuest, which provides an excellent visual explanation of the algorithm:</p>

  
    </iframe>
  </div>
  
  Gradient Descent, Step-by-Step | StatQuest</figcaption>
  
</figure>
I hope this post has provided a clear understanding of the gradient descent.</p>


My Gallery of Talentbank Boardroom Challenge 2025
2025-11-14T00:00:00+00:00
This post is a compact gallery plus debrief from the Talentbank Boardroom Challenge 2025. I keep the narrative focused on preparation, presentation decisions, and the specific lessons carried into later projects.</p>

  
  
</figure>

  
  
</figure>

  
  
</figure>

  
  
</figure>

  
  
</figure>

  
  
</figure>


Reinforcement learning practices in healtcare applications
2025-10-21T00:00:00+00:00
This note reviews practical reinforcement learning use cases in healthcare and the constraints that matter in deployment. The challenge is that policy learning in clinical settings is high-stakes, partially observed, and often offline. I summarize where RL is promising and where reliability and safety dominate design choices.</p>
In healthcare applications, artificial intelligence (AI) plays a crucial role in transforming patient care, diagnostics, and treatment planning to make healthcare more efficient and effective. However, if AI is used improperly, it may leads to worse outcomes rather than improved ones.</p>
In the subset of AI, reinforcement learning (RL) has shown great promise in optimising sequential decision-making processes, which are common settings in healtcare industry. However, applying RL in healthcare settings requires careful consideration of several important practices to ensure a safety outcome. To illustrate the pitfalls of reinforcement learning, we consider the sepsis management, which remains wide uncertainty in the way clinicians make decisions.</p>
In the context of sepsis, a history may include a patient's vital signs, laboratory results, administered treatments, and other relevant clinical information over time. The actions could involve decisions such as administering fluids, vasopressors, or antibiotics at different time points. The rewards are typically defined based on patient outcomes, such as survival rates, length of hospital stay, or improvement in clinical scores. Note that defining ideal sepsis resuscitation strategies is challenging due to the complex and dynamic nature of the condition, as well as the variability in patient responses to treatments, therefore it is not straightforward to define short-term rewards for each action taken.</p>
Here are three fundamental concerns when applying reinforcement learning in healthcare:</p>
Is the AI given access to all variables that infleunce decision making?</h1>
RL agent can only ook at the recorded data, however there are much more information and context that should be taken into consideration. Failing to consider all variables may result in esitmates that are confounded by spurious correlation.</p>
For instance, severely sick septic patients may receive fluids earlier than healthier patients yet have worse outcomes, which is clearly because of them being sicker in the first place, not because of fluids worsen the outcomes. Therefore, it is important to considers of pissble confounding factors, which even more than what is required for standard prediction studies, as the sequential nature of the problem could possibly lead to confounding effects in both long term and short term.</p>
How big was that big data?</h1>
This is relatively straightfoward. For any AI training, its necessary to feed the model an adequate amount of useful information, so is RL model. Logically, for RL model to evaluate a new policy, it needs to find a long, continuous sequence of decisions in the historical data that matches its new policy.</p>
In clinical trials, the mismatches between new treatment policy against historical data, also known as off-policy evaluation, the effective sample size can become small. The mismatches grow with the number of decisions in a patient's history. For one sepsis study, a cohort of 3,855 patients yielded an effective sample size of only a few dozen. Therefore, instead of exploring new treatment approaches, observational data shall be used only to refining existing practices.</p>
Will the AI behave prospectively as intended?</h1>
One of the core elements in the feedback loop of every RL is rewards. However, if the design of reward function is not handled properly (e.g. error in formulation, data processing etc.), the momdel will eventually laed to poor decisions.</p>
Often, an overly simple reward function may neglect the long-term effects. For instance, rewarding only blood pressure targets may result in agent that harms long-term benefit by dosing excessive vasopressors to patients. Additionally, the learned policy might decay after a period of time if there's changes in the treatment standards.</p>
Therefore, it is neccessary to use interpretable machine learning to interrogate and assess whether learned policies will behave as intended in a prospective clinincal setting</p>
Conclusion</h1>
In the end, although RL offers such promising opportunities to optimising sequential treatments in medical industry, we shall be cautious in deploying into production and requires due diligence to safely realise its potential in revolutionalising this life-saving industry.</p>

  
  
  Guidelines for Reinforcement Learning in Healthcare</figcaption>
  
</figure>
In the end, I would like to express my gratitude to Omer Gottesman et al. for providing such practical viewpoint in standardising the application of reinforcement learning in clinical settings.</p>
References:</p>

Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., & Celi, L. A. (2019). Guidelines for reinforcement learning in healthcare. Nature Medicine, 25(1), 16–18. https://doi.org/10.1038/s41591-018-0310-5</a></li>
</ul>


Action-value methods with incremental step size in reinforcement learning
2025-10-17T00:00:00+00:00
This note derives the incremental update rule for action-value estimation in k-armed bandits and explains why it is preferable to recomputing full averages. The problem is memory and compute cost when rewards accumulate over time. By the end, you get a practical update equation you can implement directly in RL experiments.</p>
Consider any k-armed bandit problem</a>, we consider each actions taken as pulling an arm of a slot machine, and the machine gives us a reward based on the action taken. We denote the action selected at time step $t$ as $A_t$, and the correspondiong reward received as $R_t$. Among the $k$ actions, the expected value of each action $a$ is denoted as $q_*(a)=\mathbb{E}[R_t|A_t=a]$, which is also known as the value</em> of that action $a$. It is rationale to choose the action with the highest value, however in the practical applications, the value of each action is unknown. Therefore, we need to estimate the value of each action, denoted as $Q_t(a)$, read as the estimated value of action $a$ at time step $t$. The fundamental goal is to find such $Q_t(a)$ that is as close as possible to the $q_*(a)$. It is commonly known as the true value</em> by community.</p>
To estimate the values of each action, which we collectively call as the action-value methods</em>, we can use the sample-average method to update the estimated value of action $a$ at time step $t$ as follows:</p>
$$Q_{t}(a)={{\text{sum of rewards when $a$ taken prior to $t$}} \over {\text{number of times $a$ taken prior to $t$}}}.$$</p>
This method simply takes the average of all the rewards received when action $a$ is taken prior to time step $t$. To simplify the notation, we focus on a single action. We denoted $R_i$ as the reward received the $i$-th time of this action is taken, and $n$ as the number of times action $a$ is taken prior to time step $t$. Logically, we can let $Q_n$ denote the estimate of its action value after it has been taken $n-1$ times. Therefore, we can rewrite the update rule as follows:</p>
$$Q_n = \frac{\sum_{i=1}^{n-1}R_i}{n-1} = \frac{R_1+R_2+\ldots+R_{n-1}}{n-1}.$$</p>
As the number of times of the action, $n$ increases, the obvious wat to update the estimate is to recalculate the average by summing up all the previous rewards and dividing it by $n-1$. However, as the number of times of the action increases to a large number, this method becomes progressively more expensive as we need to store all the previos rewards and recalculate the sum every time we need to update the estimate.</p>
But is there a better way to update the estimate without storing all the previous rewards? The answer is yes. We devise the incremental formulas for updating the estimate by</p>
$$
\begin{align*}
Q_{n+1} & = \frac{1}{n}\sum_{i=1}^{n}R_i \\
& = \frac{1}{n}\left(R_n + \sum_{i=1}^{n-1}R_i\right) \\
& = \frac{1}{n}\left(R_n + (n-1)\frac{1}{n-1}\sum_{i=1}^{n-1}R_i\right) \\
& = \frac{1}{n}\left(R_n + (n-1)Q_n\right) \\
& = \frac{1}{n}\left(R_n+nQ_n-Q_n \right) \\
& = Q_n + \frac{1}{n}[R_n - Q_n].
\end{align*}
$$</p>
This incremental formula allows us to update the estimate $Q_n$ to $Q_{n+1}$ by only using the most recent reward $R_n$ and the previous estimate $Q_n$, without the need to store all the previous rewards. The term $\frac{1}{n}$ serves as the step size, which decreases as $n$ increases, ensuring that the estimate converges to the true value over time.</p>
Even in $n=1$, we can still obtain $Q_2 = R_1$ for arbitrary initial estimate $Q_1$. In this case, the initial estimate $Q_1$ is completely ignored after the first update, as it should be. In processing the $n$th reward, the estimate is adjusted by a fraction of the error term $[R_n - Q_n]$, which is the difference between the received reward and the current estimate. This adjustment is scaled by the step size $\frac{1}{n}$, which ensures that as more data is collected, the updates become smaller, allowing the estimate to stabilize around the true value. Note that the step size here is not constant, it decreases as the number of times of the action increases.</p>
Back to the bandit problem, the proposed simulation in for pseudo-code is as follows:</p>

  Bandit Problem with Incremental Step Size</summary>
  
    Initialize</span> Q</span>(</span>a</span>)</span> arbitrarily</span> for</span> all</span> actions</span> a</span></span>
For</span> each</span> time</span> step</span> t</span> = </span>1</span>,</span> 2</span>,</span> ...</span></span>
    Select</span> action</span> A_t</span> using</span> a</span> policy</span> derived</span> from</span> Q</span> (</span>e</span>.</span>g</span>.</span>,</span> ε</span>-</span>greedy</span>)</span></span>
    Take</span> action</span> A_t</span> and</span> observe</span> reward</span> R_t</span></span>
    Update</span> the</span> estimate</span> Q</span>(</span>A_t</span>)</span> using</span>:</span></span>
        n</span> = </span>number</span> of</span> times</span> action</span> A_t</span> has</span> been</span> taken</span> prior</span> to</span> time</span> t</span></span>
        Q</span>(</span>A_t</span>)</span> = </span>Q</span>(</span>A_t</span>)</span> +</span> (</span>1</span>/</span>n</span>)</span> *</span> [</span>R_t</span> -</span> Q</span>(</span>A_t</span>)</span>]</span></span>
End</span> For</span></span></code></pre>
  </div>
</details>
In the end of this post, we derived the incremental step size method for updating the aciton-value estimates which extensively applied in reinforcement learning. This method is computationally efficient as it does not require storing all previous rewards, and it ensures convergence to the true action values over time.</p>

  
  
  Reinforcement Learning: An Introduction</figcaption>
  
</figure>
I want to express my gratitude to Sutton and Barto for their excellent book Reinforcement Learning: An Introduction</a> that provides a comprehensive introduction to the concepts and algorithms of reinforcement learning.</p>
References:</p>

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf</a></li>
</ul>


A Small Talk About Hackathons
2025-08-07T00:00:00+00:00
This post is a short reflection on hackathon culture from a participant perspective: what helps teams learn fast, where teams usually waste time, and how to keep projects grounded under deadline pressure.</p>
Along the way, I've met people who are far more experienced, far more knowledgeable than me. And honestly, I can't compete with them. It's easy to feel small in those moments.</p>
For anyone who's been through hackathons, you'll know. The amount of time, energy, effort you need to commit is just INSANE. Most competitions require you to go through multiple stages: prelim round, and sometimes a semi-final, and then the final round. Every round is like a mini-marathon, endless brainstorming, last-minute changes, and almost no sleep just to push through and deliver something that works.</p>
In the first few hackathons, I was genuinely excited. It was fun. Every time one ended, I was already looking forward to the next. But after round and round of hackathons, I started to feel the burn. Exhausted, Emptiness. YES, I still learned something new in every game, but the energy, the excitement just started to wear off.</p>
Eventually, it all stated to feel a bit empty.</p>
There was also this lingering thought back in my mind: "Why am I comparing myself to people who are fully in this industry, who live and breathe in tech every single day, when I'm still just a student trying to explore things outside my field?"</p>
There's still one more hackathon coming up, and I'll give it what I can. but after that, I really need a break. A proper one. Not just from hackathons, but from the constant cycle of proving myself. I need to breathe, reset and work on my inner health, mentally and emotionally.</p>
Not saying I'm quitting hackathons forever. Not at all. But I know I need some time to focus more on myself, rebuild myself, and come back stronger.</p>
Thanks for reading until here. Sometimes, the best thing you can do for your growth is step back, realign, and then go again.</p>


Dead Internet Theory #1
2025-07-22T00:00:00+00:00
This post is a personal reflection on authenticity online: what feels different now, why AI-generated social content often feels hollow, and where I might be overreacting. The goal is not to claim a grand theory, but to document a concrete shift in reading experience across LinkedIn and Reddit.</p>
</p>
I do admit I used ChatGPT for my content in the past, but I realised that the content is not really what I’ve done before, the experience is basically “artificial”. I don’t deny AI as a productivity tool, but sometimes you just can’t tell the existence of the content written there. There’s a coldness, an emptiness that creeps in when scrolling through these posts, making it feel like shouting into a void where no real person listens.</p>
It is really different nowadays to scroll through social media, LinkedIn, and Reddit. You don’t feel people there. It makes me really feel like the dead Internet Theory is here, and everything is full of bot activity and automatically generated content manipulated by algorithms. The authenticity of online interactions seems to be fading, they replaced by automated perfection that feels disturbingly hollow.</p>
Chatbots or AI are really good for proofreading, but they are still only good at proofreading. Soon I think they will no longer have real content or real ideas when people are writing their experiences, their ideas; they just become full of BS nowadays. Now I even wish to find some long-ass article that is awfully organised, I find some fun to read through even though it was not good, but I can find authenticity there.</p>
Has anyone noticed this? You could share them with me or offer me a perspective I haven’t considered before. I’d be more than happy to discuss it. I think we’ll might have another follow-up episode on this.</p>
Share me your thoughts: contact@jienweng.com</a></p>


Making Deepseek R1 ChatBot
2024-12-30T00:00:00+00:00
This post documents a small DeepSeek-R1 chatbot build: why I chose the model, what setup decisions mattered, and what worked in practice. Instead of focusing on AI industry drama, I keep the write-up centered on implementation choices and takeaways for future iterations.</p>
</a></p>
But forget the drama for a second, because the best part? Deepseek is open-source. That’s a huge win for the AI community. No more being locked behind API paywalls or waiting for some corporate overlord to decide what we can or can’t do. It’s out there, free to tinker with, and you bet I had to try it out for myself.</p>
So, I went ahead and do something I wanted to do for soooo long -- built a chatbot. It’s not packed with fancy features (yet), but through this little experiment, I’ve discovered some pretty interesting things about how the Deepseek R1 model works. You can try it out live here</a>.</p>
btw, we won’t dive into the technical aspects just yet—that’s coming up in the next section! Stay tuned for more details on how these improvements will work behind the scenes.</p>
The Unique "Thinking" Approach</h3>
What blows my mind the most about this whole setup is how I managed to separate the model’s thinking process from its final response. Most chatbots out there? They just spit out an answer, and you have no idea what’s happening behind the scenes. But with this, you can actually see how the model thinks through a problem before giving an answer. It’s like watching an AI have an inner monologue, refining its thoughts before speaking. And honestly? I’ve never seen this before in any LLMs I’ve used.</p>
At first, I didn’t even plan for this feature—it just happened while I was testing out different ways to improve response quality. I noticed that the model was generating some hidden reasoning steps before its final output. Instead of discarding them, I figured, Why not show them? And once I did, it was a game-changer. It made the AI feel so much more transparent—almost like it was thinking out loud.</p>
</a></p>
For example, if you ask it something like, “What do you think about climate change in Malaysia?”, you won’t just get a final answer out of nowhere. You’ll actually see the model go through a step-by-step breakdown of its thought process:</p>

Breaking down the question components</li>
Evaluating current knowledge</li>
Forming logical connections</li>
Synthesizing a comprehensive response</li>
</ol>
After seeing the model’s thinking process, what really stands out to me is how structured its response is. It doesn’t just throw out some generic take on climate change—it actually analyzes the question, breaks it down into different angles, and then builds a well-organized answer.</p>
</a></p>
That said, while the response does sound solid, there are some oddities that make me wonder what’s going on under the hood. For example, it mentions “the subtropical Andaman and Nicobar Islands”—which, uh, aren’t even part of Malaysia. Also, “Ch bamboo” initiative? Never heard of that one. These small but noticeable mistakes show that while the model is good at structuring its answers, it still struggles with factual accuracy.</p>
But that’s exactly what makes having a visible thought process so useful. Instead of just blindly trusting AI responses, we can now see how the model arrives at its conclusions—which means we can spot errors more easily. If it had hallucinated</strong> this stuff in a normal chatbot, I might not have even noticed. But because I can watch it reason through the problem, I can tell where things might be going wrong.</p>
</a></p>
This kind of transparency is what makes AI feel less like a magic black box and more like an actual tool that we can guide, correct, and refine. And that’s honestly what excites me the most about this project.</p>
Deployment Specifications</h3>
The chatbot is currently hosted on Hugging Face Spaces, running on a basic-tier instance, which means it’s not exactly a powerhouse but still gets the job done. Here’s what it’s running on:</p>

CPU: 2 vCPUs</li>
RAM: 16GB</li>
Storage: Basic instance storage</li>
Framework: Gradio</li>
Inference Optimization: FP16 quantization</li>
Average Response Time: 2-3 seconds</li>
Concurrent Users Supported: Up to 10</li>
</ul>
You might notice that the live preview</a> here can be a bit slow while generating responses. That’s because the hardware isn’t optimized for LLM inference, so it’s working with some limitations. Hope you can bear with it! 😆</p>
If you enjoy the project and want to see it run smoother, you can consider sponsoring me</a>. Who knows? With enough support, I might upgrade the resources for future projects and push this even further :D</p>
Efficient Model Architecture</h3>
The chatbot uses the Deepseek R1 Distilled 1.5B model, which is a significantly compressed version of the original 685B parameter model. Despite having only 1.5 billion parameters, it maintains impressive performance for many tasks.</p>
Key points about the model:</p>

Original model: DeepSeek R1 (685B)</a></li>
Distilled version: DeepSeek R1 Distill Qwen 1.5B</a></li>
440x parameter reduction while maintaining core capabilities</li>
</ul>
</a></p>
Impressive Benchmark Results</h3>
What’s most fascinating about this model is how well it holds up when compared to much larger models. Despite having far fewer parameters, it manages to outperform some big names in the AI world for certain tasks.</p>
</a></p>
Outstanding Performance in Key Areas</h4>

AIME 2024 (Math Competition)</strong>

DeepSeek R1 Distilled: 28.9% Pass@1</li>
GPT-4o: 9.3% Pass@1</li>
Claude 3.5: 16.0% Pass@1</li>
</ul>
</li>
MATH-500 (Mathematical Reasoning)</strong>

DeepSeek R1 Distilled: 83.9% Pass@1</li>
GPT-4o: 74.6% Pass@1</li>
Claude 3.5: 78.3% Pass@1</li>
</ul>
</li>
Codeforces (Competitive Programming)</strong>

DeepSeek R1 Distilled: 954 Rating</li>
GPT-4o: 759 Rating</li>
Claude 3.5: 717 Rating</li>
</ul>
</li>
</ol>
Model Strengths & Limitations</h3>
Strengths:</strong></p>

Superior reasoning capabilities, especially in mathematics</li>
Highly efficient with only 1.5B parameters</li>
Effective knowledge distillation from larger models</li>
Excellent performance in zero-shot scenarios</li>
</ul>
Limitations:</strong></p>

Lower performance in general coding tasks</li>
Potential language mixing issues</li>
Sensitivity to prompt formatting</li>
Limited performance in broader general knowledge tasks</li>
</ul>
This balanced perspective shows why I chose this model for my chatbot implementation - it provides exceptional reasoning capabilities while remaining lightweight enough for practical deployment.</p>
Try It Yourself</h3>
Due to iframe restrictions, you can access the live demo through these methods:</p>

Direct Link to Demo</a></li>
API Documentation</a></li>
Source Code</a></li>
</ol>
</a></p>
Summary</h3>
Deepseek has definitely shaken things up in the AI world, and the drama surrounding it is just the tip of the iceberg. Forget the finger-pointing—this move is a win for the AI community, especially since Deepseek is open-source. No more waiting around for companies to decide how we can use AI; now it’s out there for everyone to play with and improve.</p>
And as for my little experiment—building a chatbot with the Deepseek R1 model—it’s not feature-packed yet, but it’s definitely been a fun ride. You can try it out live here</a> and see how it works for yourself!</p>
Additional Resources</h3>

Model Card</a></li>
Deployment Guide</a></li>
Performance Benchmarks</a></li>
Community Discussion</a></li>
</ul>
Feel free to experiment with the live demo and share your thoughts!</p>
References</h3>

DeepSeek R1 (685B)</strong>

The original DeepSeek R1 model, a large-scale AI model with 685 billion parameters, was the precursor to the distilled 1.5B version used in the chatbot.</em>

Source</a></li>
DeepSeek R1 Distill Qwen 1.5B</strong>

This is the distilled version of the DeepSeek R1 model, compressed to 1.5 billion parameters while retaining core capabilities.</em>

Source</a></li>
Open R1 Model Architecture</strong>

Explore the detailed architecture of the DeepSeek R1 model, showcasing its design and structure.</em>

Source</a></li>
Medium - Deepseek R1 Distill Qwen 1.5B Performance</strong>

A comparison of the performance between Deepseek R1 Distilled and other models, showing its impressive results in multiple domains.</em>

Source</a></li>
Hugging Face Space - Chatbot Demo</strong>

Live demo of the Deepseek R1 chatbot that showcases the model’s response and reasoning capabilities.</em>

Source</a></li>
Hugging Face - API Documentation</strong>

Official API documentation for Hugging Face Spaces, helping developers interact with models and integrate them into applications.</em>

Source</a></li>
Hugging Face - Source Code</strong>

Direct access to the source code of the Deepseek R1 chatbot project on Hugging Face Spaces for those interested in contributing or learning.</em>

Source</a></li>
Hugging Face - Model Card</strong>

Official card for the Deepseek R1 Distilled model, providing details on its functionality and training specifications.</em>

Source</a></li>
Hugging Face - Deployment Guide</strong>

Guidelines for deploying models and applications using Hugging Face Spaces.</em>

Source</a></li>
Hugging Face - Performance Benchmarks</strong>

An overview of the model performance across various tasks and benchmarks, showcasing the strengths and weaknesses of different models.</em>

Source</a></li>
Hugging Face - Community Discussion</strong>

Join the community discussions on Hugging Face, where users can ask questions, share insights, and discuss AI-related topics.</em>

Source</a></li>
</ol>


Eco Finance: A Sustainable Future Prototype
2024-12-29T00:00:00+00:00
This post focuses on the Eco Finance prototype itself: the problem we targeted, the product concept, and what we learned from turning an idea into a demo under hackathon constraints. It is written as a project debrief rather than an event recap.</p>
We participated in PayHack 2024, quite a big hackathon event with many talented individuals.</p>
Our project, Eco Finance, focuses on the implementation of a carbon tax expected to be released in 2026. Although it initially targets specific industries, it is crucial and trending to implement this for everyone in Malaysia. Read more about the 2026 carbon tax here.</p>
European countries are already implementing ESG-centric policies in various industries, such as the automobile industry. These are just preliminary steps; now we want to delve deeper into the subject.</p>
</a></p>
Understanding the idea</h2>
Do you know how much carbon footprint you generate from ordering a Shopee parcel? Or how much carbon you generate by driving to work instead of taking public transport? It's challenging for people to visualize their footprint. Hence, we're here to make this visible to people, increasing their awareness about this issue. According to Visa, 80% of Malaysians are aware of the environmental impact of consumption. With the release of Malaysia's largest payment gateway provider and the latest project linking banks.</p>
How it works</h2>
The OpenFinance API can seamlessly integrate people's transaction details, allowing us to aggregate a person's transactions and their carbon footprints. The calculation would be the emission factor times the amount, giving us the carbon footprint from those transactions.</p>
For example, if the emission factor for a specific merchant category is 0.5 kg CO2 per RM, and a person spends RM 100, the carbon footprint would be 0.5 kg CO2/RM * 100 RM = 50 kg CO2.</p>
There's an established merchant category code (MCC) in transaction details, where we only need to fine-tune and investigate the actual emission factors for each merchant code. This could be done by collaborating with the Department of Statistics Malaysia (DOSM) to conduct surveys and research among Malaysians. In this project, we are using dummy variables based on this ideology only.</p>
Monetizing the Ecosystem</h2>
We circulate this whole ecosystem by monetizing it. We extract carbon credits from them using the formula from WORLDMETER, where the baseline of average carbon emission per person in Malaysia is 8 tons per year. With that, we can pool up carbon credits, extracted from the surplus from the calculation. There's a proven market potential with Bursa Carbon Exchange (BCX), established on 9 Dec 2022, which is available to trade carbon credits in Malaysia. Then we sell the carbon credits to major companies in Malaysia like Petronas and Maybank to help them offset their carbon credits. Read more about BCX here</a>.</p>

From Petronas</a>, it's evident that their future plan aims for net-zero carbon emission by 2050. This proves there is a market in Malaysia, and more people will enter the market and participate.</p>
Encouraging Eco-Friendly Practices</h2>
How do we encourage eco-friendly spending habits in Malaysia? We aim to attract more people, even those who are not initially interested in eco-friendly practices, to join us. We can make Malaysia greener and more sustainable, at least in the sense of ESG. We choose to reward users with healthy spending habits in terms of eco-friendly spending and reward them with cash for being environmentally friendly.</p>
</p>
Sustainable Business Model</h2>
We can summarize the business circulation here and make it sustainable as well, where we can become self-sustaining:</p>

Collect carbon credits from users</li>
Pool up carbon credits</li>
Certify carbon credits with Bursa Carbon Exchange (BCX)</li>
Sell carbon credits to companies who need them</li>
Reward users to encourage eco-friendly spending habits.</li>
</ol>
Basically that's it.</p>
Project Random Thingy</h2>
Now, it's about the random thingys of the project. We have the project hosted at this link. Although it is not fully complete, it serves as a prototype and is yet to be an MVP. Considering we had only 24 hours to complete this from idea to execution, I mean, it's good for a first-timer. Right...?</p>
You can access the hosted project prototype here</a>.</p>
Feel free to browse through it. If you have any questions, please email me, and I'll personally explain it to you. You can also see the admin page by accessing here</a> or by changing the /dashboard</code> to /admin</code>. It looks something like this:</p>
</p>
Feel free to browse through it.</p>
Wrap-Up</h2>
In conclusion, Eco Finance aims to make carbon footprints visible to individuals, encouraging eco-friendly spending habits and contributing to a greener Malaysia. By monetizing carbon credits and rewarding users, we create a sustainable ecosystem that benefits both the environment and the economy. It's really kesian that we couldn't make it to final though T.T</p>
Hope you guys like it :D</p>
Oh btw! Once again.. You can read the full story here</a>.</p>


First Physical Hackathon Experience
2024-12-02T00:00:00+00:00
This post records what our first physical hackathon taught us as a math-heavy team entering a software-first environment. I focus on concrete lessons from ideation, mentoring, and pitching that we can reuse in future competitions.</p>
I found myself staring at my phone, thumb hovering over the share button. "Am I really qualified for this?" I thought to myself. "What if we make fools of ourselves?" The doubts crept in like unwanted guests. But then another voice, stronger and more determined, pushed back: "When else will we get a chance like this? We might not be coders, but we know how to solve problems. Isn't that what hackathons are really about?"</p>
I reached out to my fellow mathematics coursemates: Janice, Roius, and Gwyn. "Hey, want to do something crazy?" I asked, half expecting them to laugh it off. To my surprise, their responses came quickly, filled with enthusiasm despite (or maybe because of) our collective inexperience. Only 2 of us had ever participated in a hackathon before, and Gwyn had never even written a line of code. But there we were, four mathematics students from UTAR, signing up for one of the most competitive hackathons in the country.</p>
The looks we got when we arrived were priceless. "Are you guys from Computer Science?" someone asked, eyeing our team with curiosity. We exchanged glances and grinned. "Nope, we're Math students, haha..." The mixture of surprise and skepticism on their faces was something I'll never forget. In those moments, our outsider status felt both terrifying and weirdly empowering.</p>
</p>
November 30th marked our first day, and it was a blur of ideation and learning. We came up with an innovative idea: creating an app to visualize transaction carbon footprints, with the ability to pool and trade carbon footprint surpluses with companies in need. On paper, it sounded promising—a perfect blend of fintech and environmental consciousness.</p>
</p>
Then came the intense mentoring sessions with Johan Nasir. He didn't hold back. "What happens if the carbon credits are manipulated?" he'd challenge. "How do you ensure the authenticity of the footprint data?" Another round of rethinking. Each session felt like an intense rotan session—tough love at its finest. He'd poke holes in our solutions, push us to think deeper, and force us to confront real-world problems that actually needed solving.</p>
The week before the hackathon was a rollercoaster. Competing against almost 100 teams from across Malaysia, we were shocked to make it to the Top 32. When we saw Johan's name among our judges and received the news of our advancement, our excitement was through the roof. But reality quickly set in—we had a major problem. None of us had real web development experience. My knowledge was limited to Python, SQL, and vanilla HTML/CSS/JavaScript.</p>
With just four days between Tuesday and Friday, we had to make crucial technical decisions while juggling our internships. After intense research, we settled on Vue.js and Flask. Janice even traveled all the way from JB to KL for this. Every evening after our internships, we'd dive into tutorials, trying to absorb as much as we could about our chosen tech stack.</p>
</p>
The hackathon itself was intense. When exhaustion hit, we took turns napping wherever we could. I couldn't get a tent, so I made do with a random bench for quick 4-hour power naps before jumping back into coding. Everything seemed to be going smoothly until the morning of the submission. At 8:30 AM, just after breakfast and 90 minutes before the deadline, our backend crashed—the information couldn't be parsed properly. In a desperate move, we had to hardcode some components just to make the submission deadline at 10 AM.</p>
</p>
While we didn't make it to the Top 10, watching the final pitches was an eye-opening experience. The winning teams showcased solutions that were not just technically impressive but also deeply thoughtful about real-world implementation. One team's blockchain-based remittance system particularly stood out - their attention to regulatory compliance and market research was incredible. "We should have done more market validation," I thought to myself. "Next time, we need to focus not just on the technical solution but on the whole business case."</p>
The top teams also demonstrated masterful presentation skills. Their pitches weren't just about features - they told compelling stories about why their solutions mattered. Each slide was carefully crafted, each demo was flawlessly executed, and their responses to judges' questions showed deep understanding of both technical and business aspects. I made mental notes: "Practice the pitch more. Know your numbers. Be ready for any question."</p>
Despite not making the finals, I did win a Samsung monitor in the lucky draw, a small consolation that brought some laughs to our tired team.</p>
</p>
Looking back now, the sleepless nights and endless debugging sessions blur together, but certain moments stand crystal clear: the late-night breakthrough when our first feature finally worked, the proud smile on Johan's face during our final presentation, and most importantly, the unshakeable bond formed between four mathematicians who dared to dream.</p>
</p>
To Janice, Roius, and Gwyn: thank you for taking this leap of faith with me. For believing that our mathematical minds could contribute something meaningful to the tech world. To Johan: your guidance went beyond mentorship—you showed us that innovation comes from daring to be different. And to PayNet and JomHack: thank you for creating a space where even mathematics students could discover their potential in technology.</p>
They say the best stories come from stepping out of your comfort zone. Well, we didn't just step—we took a giant leap. And while our first hackathon journey has ended, something tells me this is just the beginning of our adventure in the tech world. The equations and formulas we've studied for years are no longer just abstract concepts—they're tools waiting to be applied in the vast playground of technology.</p>
After all, who says mathematicians can't be hackers too? Sometimes the best innovations come from those who dare to cross the boundaries between disciplines, who bring fresh perspectives to old problems. And maybe, just maybe, that's exactly what the tech world needs more of.</p>

Properties of the Correlation Coefficient</h2>
The Pearson correlation coefficient has several important properties:</p>

1. Bounded between -1 and 1</h3>
$$-1 \leq r \leq 1.$$</p>
This follows from the Cauchy-Schwarz inequality, which states that $(S_{xy})^2 \leq S_{xx} \cdot S_{yy}$, with equality only when all data points lie exactly on a line.</p>

3. Symmetry</h3>
$$r_{xy} = r_{yx}.$$</p>
The correlation between $x$ and $y$ is the same as the correlation between $y$ and $x$. This follows because $S_{xy}$ is symmetric in $x$ and $y$.</p>

4. Invariance under linear transformation</h3>
If we define $u_i = a + bx_i$ and $v_i = c + dy_i$ with $b > 0$ and $d > 0$, then the correlation between $u$ and $v$ equals the correlation between $x$ and $y$. If either $b$ or $d$ is negative (but not both), the sign of $r$ flips.</p>

5. Dimensionless</h3>
The correlation coefficient has no units. The numerator $S_{xy}$ has units of $x$ times $y$, and the denominator $\sqrt{S_{xx} \cdot S_{yy}}$ also has units of $x$ times $y$, so they cancel.</p>

Impressive Benchmark Results</h3>
What’s most fascinating about this model is how well it holds up when compared to much larger models. Despite having far fewer parameters, it manages to outperform some big names in the AI world for certain tasks.</p>
</a></p>

Jien Weng

Making Linear Algebra Make Sense with Gemini CLI

Correlation

Polynomial Regression

Multiple Linear Regression

Simple Linear Regression

Deriving the Normal Equations</h2>
To minimize the SSE, we take partial derivatives with respect to $\hat{\beta}_0$ and $\hat{\beta}_1$ and set them equal to zero.</p>

Let's talk about Solar Photovoltaic Systems in WWTPs Malaysia

Gradient Descent Algorithm Explained

My Gallery of Talentbank Boardroom Challenge 2025

Reinforcement learning practices in healtcare applications

Action-value methods with incremental step size in reinforcement learning

A Small Talk About Hackathons

Dead Internet Theory #1

Making Deepseek R1 ChatBot

Eco Finance: A Sustainable Future Prototype

First Physical Hackathon Experience

Jien Weng

Making Linear Algebra Make Sense with Gemini CLI

Correlation

Properties of the Correlation Coefficient</h2> The Pearson correlation coefficient has several important properties:</p>

1. Bounded between -1 and 1</h3> $$-1 \leq r \leq 1.$$</p> This follows from the Cauchy-Schwarz inequality, which states that $(S_{xy})^2 \leq S_{xx} \cdot S_{yy}$, with equality only when all data points lie exactly on a line.</p>

3. Symmetry</h3> $$r_{xy} = r_{yx}.$$</p> The correlation between $x$ and $y$ is the same as the correlation between $y$ and $x$. This follows because $S_{xy}$ is symmetric in $x$ and $y$.</p>

4. Invariance under linear transformation</h3> If we define $u_i = a + bx_i$ and $v_i = c + dy_i$ with $b > 0$ and $d > 0$, then the correlation between $u$ and $v$ equals the correlation between $x$ and $y$. If either $b$ or $d$ is negative (but not both), the sign of $r$ flips.</p>

5. Dimensionless</h3> The correlation coefficient has no units. The numerator $S_{xy}$ has units of $x$ times $y$, and the denominator $\sqrt{S_{xx} \cdot S_{yy}}$ also has units of $x$ times $y$, so they cancel.</p>

Polynomial Regression

Multiple Linear Regression

Simple Linear Regression

Deriving the Normal Equations</h2> To minimize the SSE, we take partial derivatives with respect to $\hat{\beta}_0$ and $\hat{\beta}_1$ and set them equal to zero.</p>

Let's talk about Solar Photovoltaic Systems in WWTPs Malaysia

Gradient Descent Algorithm Explained

My Gallery of Talentbank Boardroom Challenge 2025

Reinforcement learning practices in healtcare applications

Action-value methods with incremental step size in reinforcement learning

A Small Talk About Hackathons

Dead Internet Theory #1

Making Deepseek R1 ChatBot

Outstanding Performance in Key Areas</h4> AIME 2024 (Math Competition)</strong> DeepSeek R1 Distilled: 28.9% Pass@1</li> GPT-4o: 9.3% Pass@1</li> Claude 3.5: 16.0% Pass@1</li> </ul> </li>

Eco Finance: A Sustainable Future Prototype

Sustainable Business Model</h2> We can summarize the business circulation here and make it sustainable as well, where we can become self-sustaining:</p>

First Physical Hackathon Experience

Properties of the Correlation Coefficient</h2>
The Pearson correlation coefficient has several important properties:</p>

1. Bounded between -1 and 1</h3>
$$-1 \leq r \leq 1.$$</p>
This follows from the Cauchy-Schwarz inequality, which states that $(S_{xy})^2 \leq S_{xx} \cdot S_{yy}$, with equality only when all data points lie exactly on a line.</p>

3. Symmetry</h3>
$$r_{xy} = r_{yx}.$$</p>
The correlation between $x$ and $y$ is the same as the correlation between $y$ and $x$. This follows because $S_{xy}$ is symmetric in $x$ and $y$.</p>

4. Invariance under linear transformation</h3>
If we define $u_i = a + bx_i$ and $v_i = c + dy_i$ with $b > 0$ and $d > 0$, then the correlation between $u$ and $v$ equals the correlation between $x$ and $y$. If either $b$ or $d$ is negative (but not both), the sign of $r$ flips.</p>

5. Dimensionless</h3>
The correlation coefficient has no units. The numerator $S_{xy}$ has units of $x$ times $y$, and the denominator $\sqrt{S_{xx} \cdot S_{yy}}$ also has units of $x$ times $y$, so they cancel.</p>

Deriving the Normal Equations</h2>
To minimize the SSE, we take partial derivatives with respect to $\hat{\beta}_0$ and $\hat{\beta}_1$ and set them equal to zero.</p>