<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://beevus77.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://beevus77.github.io/" rel="alternate" type="text/html" /><updated>2026-05-18T15:26:10-07:00</updated><id>https://beevus77.github.io/feed.xml</id><title type="html">Computational Catharsis</title><subtitle>personal description</subtitle><author><name>James Berkeley Larsen</name></author><entry><title type="html">Arch Madness</title><link href="https://beevus77.github.io/posts/2026/05/post/" rel="alternate" type="text/html" title="Arch Madness" /><published>2026-05-16T00:00:00-07:00</published><updated>2026-05-16T00:00:00-07:00</updated><id>https://beevus77.github.io/posts/2026/05/arch-madness</id><content type="html" xml:base="https://beevus77.github.io/posts/2026/05/post/"><![CDATA[<p>Sorry for the lack of blog posts the past few months, things have been a little crazy.</p>

<h1 id="arch-madness">Arch Madness</h1>

<p>Every month, Jane Street launches a <a href="https://www.janestreet.com/puzzles/current-puzzle/">new puzzle</a>. This month, they unveiled something called “Arch Madness”, which is an absolutely beautiful little game.
After showing their puzzle to my friends David P and James S, we spent many hours late into the night figuring it out. But these clever friends of mine didn’t stop there! David P went ahead and vibe-coded a little tool to help solve the puzzles.</p>

<p>I have always loved little daily puzzle games, and I saw the potential to turn David’s tool into a beautiful quick puzzle game. So, after an afternoon date with Cursor, I am launching my daily puzzle game, check it out <a href="https://jamesblarsen.com/projects/arch-madness/">here</a>. Please let me know what you think! And if you like it, try to make puzzles of your own with the <a href="https://jamesblarsen.com/projects/arch-madness/explorer/">explorer tool</a>. If you make a particularly good one and send me the json, it may just be featured!</p>]]></content><author><name>James Berkeley Larsen</name></author><category term="arch-madness" /><summary type="html"><![CDATA[Sorry for the lack of blog posts the past few months, things have been a little crazy.]]></summary></entry><entry><title type="html">POMDP</title><link href="https://beevus77.github.io/posts/2026/02/pomdp/" rel="alternate" type="text/html" title="POMDP" /><published>2026-02-21T00:00:00-08:00</published><updated>2026-02-21T00:00:00-08:00</updated><id>https://beevus77.github.io/posts/2026/02/pomdp</id><content type="html" xml:base="https://beevus77.github.io/posts/2026/02/pomdp/"><![CDATA[<h1 id="minesweeper-as-a-markov-process">Minesweeper as a Markov Process</h1>
<p>As promised two weeks ago, I’ve started my attempt to formulate Minesweeper in the language of reinforcement learning (RL). The mathematical backbone of RL is a Markov decision process (MDP), consisting of a state space $\mathcal{S}$, an action space $\mathcal{A}_s$ for each state $s \in \mathcal{S}$, transition probabilities $P_a(s,s’)$ for states $s,s’\in\mathcal{S}$ and action $a \in \mathcal{A}_s$, and the reward functions $R_a(s,s’)$. The goal of the MDP is to find the policy $\pi$ that determines the best action $a$ from any given state, i.e.,
\begin{equation}
  \pi^* = \arg\max_\pi \mathbb{E}_{s \sim \pi, a \sim \pi(s)} \left[ \sum_{t=0}^\infty \gamma^t R_a(s_t, a_t) \right]
\end{equation}
for some discount factor $\gamma \in [0,1]$.</p>

<p>Naïvely, the state of our Minesweeper game is the board configuration, an action would correspond to selecting/revealing cells, and the reward would be whether we have exploded or not. But things are going to get interesting… if our MDP knew the board configuration, the problem would be trivial! The optimal policy is to reveal all cells that aren’t mines. Somehow, we need to obscure the location of the mines from our MDP. Enter the partially observable Markov decision process (POMDP). Amazingly, in this sense the game of Minesweeper is more mathematically complicated than say Go or Chess where the players have perfect information. To reflect the partial observability of Minesweeper, we need to augment our MDP with an observation space $\mathcal{O}$ and base our policy and the corresponding actions on observations from $\mathcal{O}$. This simple distinction already rules out several RL algorithms, e.g. vanilla Q-learning.</p>

<p>To be continued…</p>

<p>As an aside, I love the way Andy Jones explains the <a href="https://andyljones.com/posts/policy-gradient.html">policy gradient theorem</a>.</p>

<h1 id="update-from-my-cursor-agent">Update from my Cursor Agent</h1>

<p><strong>First RL agent (no reward shaping).</strong> Model: PPO with CNN state encoder (3-channel obs: revealed, flagged, adj). Trained on Beginner 9×9, 500k steps (25:11, 331 step/s). Result: best=-0.141, mean_r=-0.182, wins=0. Exported to ONNX and wired into RLMS solver dropdown so you can watch it play—it’s bad on purpose, to motivate reward shaping. Next step: add light reward shaping (+0.02 per safe reveal) and retrain.</p>]]></content><author><name>James Berkeley Larsen</name></author><category term="reinforcement learning" /><summary type="html"><![CDATA[Minesweeper as a Markov Process As promised two weeks ago, I’ve started my attempt to formulate Minesweeper in the language of reinforcement learning (RL). The mathematical backbone of RL is a Markov decision process (MDP), consisting of a state space $\mathcal{S}$, an action space $\mathcal{A}_s$ for each state $s \in \mathcal{S}$, transition probabilities $P_a(s,s’)$ for states $s,s’\in\mathcal{S}$ and action $a \in \mathcal{A}_s$, and the reward functions $R_a(s,s’)$. The goal of the MDP is to find the policy $\pi$ that determines the best action $a$ from any given state, i.e., \begin{equation} \pi^* = \arg\max_\pi \mathbb{E}_{s \sim \pi, a \sim \pi(s)} \left[ \sum_{t=0}^\infty \gamma^t R_a(s_t, a_t) \right] \end{equation} for some discount factor $\gamma \in [0,1]$.]]></summary></entry><entry><title type="html">Rich Sutton Plays Minesweeper</title><link href="https://beevus77.github.io/posts/2026/02/bitter-lesson/" rel="alternate" type="text/html" title="Rich Sutton Plays Minesweeper" /><published>2026-02-07T00:00:00-08:00</published><updated>2026-02-07T00:00:00-08:00</updated><id>https://beevus77.github.io/posts/2026/02/bitter-lesson</id><content type="html" xml:base="https://beevus77.github.io/posts/2026/02/bitter-lesson/"><![CDATA[<h1 id="month-two">Month Two</h1>
<p>Turns out, a new year’s resolution to blog every week is too ambitious. Maybe I can update that to once a month? Here goes a simple February blog post, coming back into the CompCath sandbox to see what I can put together.</p>

<p>I’m still on a bit of an unhealthy Minesweeper kick, so I channeled that energy this morning into a vibe-coded autosolver <a href="https://jamesblarsen.com/projects/rlms/">here</a>. It is still very much a work in progress, and I couldn’t have done it without my loyal Cursor agent.</p>

<h1 id="the-bitter-lesson">The Bitter Lesson</h1>
<p>One of my primary motivations in having my own coded version of Minesweeper was my attempt to foray in the coming weeks into reinforcement learning. Back in the <a href="https://acme.byu.edu/degree-requirements">ACME Junior Core</a>, we briefly experimented with RL in a <a href="https://labs.acme.byu.edu/Volume2/Gymnasium/Gymnasium.html">Q-Learning lab</a>, but I’d be lying if I said I fully internalized its beauty and simplicity.</p>

<p>Recently, I stumbled upon <a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">Rich Sutton’s Bitter Lesson</a>, and it resonated with my intellect on a profound level. While I might go so far as to make this required reading for anyone living in the post-ChatGPT world, at the very least I hope the AI curious will take the time to give it a read. The TLDR: Rich is one of the pioneers of RL. The bitter lesson is that additional compute beats additional human thought. Instead of trying to simplify complex phenomena, to truly innovate in this day and age we need to just let the wonders of computational search and learning scale. That release–letting go of the relentless and futile search for understanding by the human mind and just embracing the inconceivable complexity–embodies the spirit of Computational Catharsis better than any other idea I’ve seen. Thanks Rich.</p>

<p>Maybe soon you will see my humble attempt at a reinforcement learning agent conquering the simple world of Minesweeper come live in your browser. Can it beat <a href="https://davidnhill.github.io/JSMinesweeper/index.html?board=30x16x99">the best human solvers I could find</a>? I’m skeptical.</p>

<h1 id="miscellanea">Miscellanea</h1>
<ul>
  <li>Scientifically accurate <a href="https://giftarticle.ft.com/giftarticle/actions/redeem/58ca881a-b6f1-40c4-a634-b2a13a906926">Financial Times article</a> using quantum physics as a metaphor for current geopolitical relations, written by the former president of Armenia who happens to be a physicist</li>
  <li>Dario Amodei gives his <a href="https://www.darioamodei.com/essay/the-adolescence-of-technology">two cents</a> on AI alignment, condemning doomerism while still encouraging restraint</li>
</ul>]]></content><author><name>James Berkeley Larsen</name></author><category term="vibe coding" /><category term="reinforcement learning" /><summary type="html"><![CDATA[Month Two Turns out, a new year’s resolution to blog every week is too ambitious. Maybe I can update that to once a month? Here goes a simple February blog post, coming back into the CompCath sandbox to see what I can put together.]]></summary></entry><entry><title type="html">New Year, New Me</title><link href="https://beevus77.github.io/posts/2026/01/new-year/" rel="alternate" type="text/html" title="New Year, New Me" /><published>2026-01-10T00:00:00-08:00</published><updated>2026-01-10T00:00:00-08:00</updated><id>https://beevus77.github.io/posts/2026/01/new-year</id><content type="html" xml:base="https://beevus77.github.io/posts/2026/01/new-year/"><![CDATA[<h1 id="trajectory">Trajectory</h1>
<p>With the arrival of the arbitrarily chosen day set aside by our society as <em>the first of the year</em> comes the ritualistic resolutions meant to change our behavior. One such resolution of mine was to revive <strong>Computational Catharsis</strong> as a sandbox for my thoughts. Here goes nothing.</p>

<p>Beyond that simple goal (coupled with a minimal habit of the <a href="https://jamesclear.com/">James Clear</a> variety) and a few others, I was thinking more abstractly about who I am, who I want to become, and what it takes to bridge the gap between the two. My mind kept gravitating towards the idea of a <em>trajectory</em>. The fundamental idea behind calculus is that dynamics of a system are best understood through a study of rates of change. If I want to predict or modify my own personal trajectory, perhaps I need to take a personal derivative of sorts: I propose that the derivative of who I am with respect to time is who I am becoming–the bridge to who I want to become. This state of being changes smoothly, hence the inevitable failure of overly ambitious goals.</p>

<h1 id="why-we-remember">Why We Remember</h1>
<p>I finished Charan Ranganath’s debut pop-science book, and was quite pleased with it. Beyond reinforcing my habit of regularly discounting my own recollections as “false memories”, Ranganath provided a medium for me to marvel at the remarkable computational abilities of our biological hardware. I found it similar in vibe and scientific rigor to Matthew Walker’s <em>Why We Sleep</em>, but maybe not as life changing for the simple reason that sleep studies inform behavior modification more naturally than memory studies. Embarrassingly, my biggest takeaway might be the brief introduction to the idea of <a href="https://en.wikipedia.org/wiki/Cryptomnesia">cryptomnesia</a>, something I’ve needed a word for since hearing the similarities between the openings of Leon Bridges’ <em>Coming Home</em> and Taylor Swift’s <em>Lover</em>.</p>

<h1 id="miscellanea">Miscellanea</h1>
<p>Other curiosities I spent time with this past week:</p>
<ul>
  <li><a href="https://www.lesswrong.com/posts/gpyqWzWYADWmLYLeX/how-ai-is-learning-to-think-in-secret">A LessWrong disciple frets about LLMs inventing their own languages for reasoning</a></li>
  <li><a href="https://danwang.co/2025-letter/">Dan Wang continues the Silicon Valley vs China discussion</a></li>
  <li><a href="https://arxiv.org/abs/2601.04621v1">Garnet Chan and co tackle FeMoCo</a></li>
</ul>]]></content><author><name>James Berkeley Larsen</name></author><category term="first post" /><category term="meta" /><summary type="html"><![CDATA[Trajectory With the arrival of the arbitrarily chosen day set aside by our society as the first of the year comes the ritualistic resolutions meant to change our behavior. One such resolution of mine was to revive Computational Catharsis as a sandbox for my thoughts. Here goes nothing.]]></summary></entry><entry><title type="html">Hello World</title><link href="https://beevus77.github.io/posts/2024/11/hello-world/" rel="alternate" type="text/html" title="Hello World" /><published>2024-11-20T00:00:00-08:00</published><updated>2024-11-20T00:00:00-08:00</updated><id>https://beevus77.github.io/posts/2024/11/blog-post-1</id><content type="html" xml:base="https://beevus77.github.io/posts/2024/11/hello-world/"><![CDATA[<h1 id="genesis">Genesis</h1>
<p>With November quickly fading away, I am thinking ahead to future iterations of the universe. Why not include a personal online blog/CV in such future iterations? Hence, the birth of <strong><em>Computational Catharsis</em></strong>.</p>]]></content><author><name>James Berkeley Larsen</name></author><category term="first post" /><category term="meta" /><summary type="html"><![CDATA[Genesis With November quickly fading away, I am thinking ahead to future iterations of the universe. Why not include a personal online blog/CV in such future iterations? Hence, the birth of Computational Catharsis.]]></summary></entry></feed>