Blogging in the LLM age: a second golden age for blogs?

LLMs (large language models) are currently scraping all text on the public internet, training on it, and spitting out variants of that text in response to queries. I think this fact makes now a golden age for blog writing. If you have ever thought about writing a blog, the time is now.

This idea is not unique or original1, but I am completely convinced by it. The purpose of this post is to explain it in my own words.

Read more…

Chebyshev Scalarization Explained

I've been reading about multi-objective optimization recently.1 Many papers state limitations of "linear scalarization" approaches, mainly that it might not be able to represent all Pareto-optimal solutions (if this sentence does not make sense to you, see background). Chebyshev scalarization is sometimes mentioned as an alternative which can represent all solutions. However, these papers mention it in passing without a proper explanation, and I did not find a good explanation of it online.2

After doing a bit of my own research,3 I found that Chebyshev scalarization is actually not too complicated, so I thought I would explain it online. In this post, I:

  • Give definitions for Chebyshev scalarization (for both maximization and minimization)
  • Give a simple proof that it can represent all Pareto-optimal solutions
  • Explain its relationship to linear scalarization via $\ell_p$ norms.
  • Give a geometric interpretation via an interactive visualization

Read more…

Some Beijing travel tips

Earlier this year I went to Beijing. Here are some miscellaneous travel tips. What I am writing here is probably not unique or original, treat this just as my personal emphasis / endorsement.

Read more…

Coding python packages with AI

I tried using some new LLM tools to code 2 entire python packages (instead of editing a handful of lines at a time, which is what I did previously). It went well! These tools are not perfect, but they are useful!

Read more…

Why your active learning algorithm may not do better than random

I am a big fan of active learning, but I am also acutely aware of its potential failure modes. A common failure mode is random-like performance: achieving no better "success"1 in picking points than a random policy. Indeed, it is possible to experience this when the implementation is flawed.2 However, in some problems it may not be possible to beat random-like performance. In this post I try to explain why.

Read more…

Using LLMs to improve my Chinese

I've been learning Chinese for almost 10 years now, but still make awkward-sounding sentences when I speak. A few months ago I thought "why not use LLMs to help me speak more naturally", and found that it does not take much prompting to get useful feedback. Here is a conversation with Claude 3.5 from a few months ago:

Read more…

Conceptual confusion about desirable outputs of reaction prediction models.

In the literature about machine learning for retrosynthesis, one line of work tries to predict chemical reactions, either in the forward direction (ie what products will A + B form) or in the backward direction (ie what reactants could react to produce molecule C). Such models are often trained on datasets of known reactions like Pistachio or USPTO, with the hope of generalizing to new "correct" reactions. However, this formulation of the problem overlooks a lot of subtleties about what makes a reaction "correct". In this post I will present a more nuanced mental model which (hopefully) clarifies some ambiguities.

Read more…

Punishing poor reviewers at CVPR

This year CVPR pledged to make all authors participate in peer review, and reject papers from authors who wrote low-quality reviews.1 Last week they confirmed on Twitter that they followed through with this and rejected 19 papers. Presumably this is a tiny fraction of the overall papers submitted, but I hope this is an effective deterrent for future authors. At the very least, I'm glad that some major conference tried something like this!

Why don't ML conferences provide reviewer instructions?

I remember when I first received an invitation to review papers for an ML conference in late 2020. What surprised me most was not that I was being invited (even though that was a surprise, since I was just a second year PhD student who had only just completed writing a paper myself). Instead, it was the lack of instruction of how to assess the papers: essentially just "write your reviews by date X", and "evaluate novelty, significance, soundness, etc". In fact, in all the years since, I think I have never received explicit instructions for reviewing ML conference papers.1

Read more…

Alpha over LLMs

On a recent podcast, Patrick McKenzie mentioned the idea of "alpha over LLMs": does a publisher produce text with any meaningful advantage over asking an LLM? I think this is an important question for anybody trying to regularly write, even if the readership is small (eg this blog). I interpret this as:

  • People should not put out content which is obviously wrong and can be corrected by an LLM (eg "I have theory X" where asking an LLM provides clear and convincing counter-arguments to X).
  • People should also not put out content which is worse than the answer you get from asking an LLM (eg the same content but explained less clearly).

I will generally try to uphold this principle in future blog posts.

Is offline model-based optimization a realistic problem? (I'm not convinced)

This is a "quickpost": a post which I have tried to write quickly, without very much editing/polishing. For more details on quickposts, see this blog post.

Offline model-based optimization (OMBO in this post) is essentially 1-shot optimization using a fixed dataset. You see data, do whatever you want, then propose a batch of query points, which are then evaluated. Hopefully, one (or most) of the query points are optimal (or near optimal). End of task.

Read more…

Experiment: more posts, lower quality

Since starting my new position at Valence, my efforts to write more on my blog have clearly been successful.1 However, my internal list of "things I would like to write a blog post about" is growing far faster than I am actually able to write blog posts about things where I do think it is worth putting an opinion online.

Read more…

Hiring is hard: why good applicants without connections can get overlooked.

Knowing people is a great way to get hired. Nepotism is one obvious explanation (aka people hire you because they like you, or to gain favors from people who like you). I (along with most other people) think that nepotism is bad: it's unfair, and gives jobs to people who are probably not that good at them. However, it is a mistake to think that nepotism is the only reason why people who are known get hired, and that this practice is always bad. Some better reasons are:

Read more…

Double checking that Gauche's fingerprint kernels are positive definite.

GAUCHE is a library for Gaussian processes in chemistry. I contributed a small amount to GAUCHE several years ago but am not an active developer. I recently learned that some new fingerprint kernels were added. In this post I examine whether these kernels are positive definite (PD), and if there are any restrictions attached.

Using a small set of lemmas (of which two were new to me), I am convinced that all but two of the kernels are PD, without being restricted to binary vectors. The remaining 2 I am unsure of, but don't claim that they are not PD.

Read more…

Review of NeurIPS 2024 and predictions for ML in 2025

I was fortunate to attend NeurIPS 2024, arguably the largest and most influential machine learning conference in the world (thanks Valence for sponsoring my trip 🙏). In this post I will try to summarize what I learned at NeurIPS, and cautiously make some predictions for the year ahead.

Read more…

What can eduroam teach us about building research infrastructure

Eduroam is a fantastic piece of academic infrastructure: students/researchers from thousands of universities around the world can automatically connect to WiFi and any partner institutions using login details from their home institution. To me it's surprising that it exists, given that it has many characteristics of projects which academia is terrible at accomplishing:

Read more…

Scientific conferences as approximate Bayesian inference

Scientists should ideally form their beliefs based on evidence and update their beliefs as new evidence arrives. Unfortunately, humans are far from perfect Bayesian thinkers and therefore may struggle to do this properly. In this post I explain how conferences help scientists perform better Bayesian inference.

Read more…

Thoughts on Google Vizier

Vizier, described in a recent paper from Google, is a black-box optimization algorithm deployed for "numerous research and production systems at Google". Allegedly, this one algorithm works well on a wide range of settings (something which the "no-free-lunch-theorem" suggests might not be possible). In this post I provide my thoughts on what key design decisions likely make this algorithm work well.

Read more…

Being 'data-driven' does not mean that you should use bad data.

Relying on data rather than intuitions to make decisions is usually a good thing, but is not always better. When one needs to make a decision about things for which there is no good data it might be better to rely on intuition rather than the best proxy available. Here are some examples where I think an intuition-based approach can be better than a data-driven approach (but still worse than a data-driven approach with good data):

Read more…

Problems with the top-k diversity metric for diverse optimization

NOTE this blog post can be run as a jupyter notebook. I re-ordered the cells to make it easier to read; to re-produce all the plots see instructions at the end of the post.

Background

"Diverse optimization" has been a popular topic in machine learning conferences for a few years now, particularly in the "AI for drug discovery" sub-field. In this context, the goal of "optimization" algorithms is to suggest promising drug candidates, where "promising" means maximizing one (or more) objective functions. An example of an objective function could be a docking score (an approximate simulation of the interactions between a protein and a molecule). "Diverse" optimization further requires that an algorithm produce multiple distinct candidate solutions. This is typically desired when the objective functions don't fully capture everything we want (for example, a drug candidate also having low toxicity). The hope is that a diverse set of candidates will have a higher chance of one useful candidate compared to a non-diverse sets.

Read more…

How I chose a static site generator

Recently, I wanted to update my website to look a bit more polished (and support additional features such as automatically generating pages for my publications). In the end I decided to completely switch from building my website with Jekyll to nikola instead. This post explains my thought process for this (in case anybody else is considering a similar switch).

Read more…

Reasons to have a website

I created this website because I thought (and continue to think) that having a website can benefit one's career. Essentially, a professional website serves as an accessible source of information about oneself for prospective employers, coworkers, and employees. Unless you put something horrible on your website, the effect should at worst be neutral, so there is essentially no downside to having one.

In the remainder of the post I will lay out a more detailed case for having a website and address some potential hesitations people might have about creating one.

Read more…

A Quick Tutorial on Bash Quotes

Today I learned way more about quotations in bash than I ever thought I needed to know. I thought I would highlight the interesting use case that I discovered, which requires some special trickery to write a script that executes arbitrary commands. First, let's quickly review some facts about bash quotes.

Read more…

How to Keep a Communal Fridge Clean

Last month, my class decided that we should get a fridge for the class study room. This brought up an important question: how would the fridge be cleaned? I thought this was an interesting problem and deserved some discussion, both from a practical and a theoretical standpoint.

Read more…

Language Travel Logs: Japanese 2018

One dream I have always had since I started learning languages is to be able to go to another country and use that language to communicate. This August I had the first opportunity to do that during a 2 week trip to Japan. In this post, I will outline the preparation I did before going, where I was able to use it when I was there, and evaluate my success.

Read more…

Why you should never be certain of your beliefs: a Bayesian perspective

People are notoriously bad at estimating their percent confidence in their beliefs, as explained further in this Wikipedia article. Something I thought of recently is what effect this overconfidence has from a Bayesian perspective. After a bit of math, I came to the conclusion that having extreme confidence in your beliefs (0% or 100% confidence) implies that you would be unable to change your beliefs if shown evidence to the contrary. I believe this simple argument suggests that it is very irrational to hold prior beliefs of 0 or 100%. If you do feel this way, then you should choose a very high value (99.99%) or a very low value (0.001%), but always leave some room for error.

Read more…

An Overview of Gradient Boosting and Popular Libraries for it.

Everybody doing machine learning wants the best models possible. The aim of this blog article is the following:

  1. To provide an introduction to the machine learning technique known as boosting, and specifically gradient boosting.
  2. To compare/contrast boosting with other ensemble methods, such as bagging
  3. To explain and compare several popular gradient boosting frameworks, specifically XGBoost, CatBoost, and LightGBM.

Read more…

Turning Adam Optimization into SGD

Motivation

This strange question came up when working on a machine learning project to generate embeddings. Working with the version of Pytorch available on our DGX (similar to version 0.3.1), I found there was an optimizer called SparseAdam but not one called SparseSGD. Since what I really wanted to do was use SGD, I wondered: could I turn the Adam optimizer into an SGD optimizer by setting the hyperparameters \(\beta_1\), \(\beta_2\), and \(\epsilon\)?

Read more…

Paper Review: A Computational Approach to Organizational Structure

Motivation

Ever since I've started doing internships, the concept of efficient organizations has piqued my interest. In every workplace I have been in, time is always wasted by inefficient transfers of information. For example, long meetings where most of the content is irrelevant to most of the people, or repeated interactions with co-workers where you explain the same thing to all of them. Assuming employees make ~$40/hour, a 1 h meeting with 100 people will cost $4000! If these meetings are not productive, then the company gets a negative return on its time investment, which essentially means the company is wasting money. Clearly organizational efficiency is a financially important objective.

Much of an organization's efficiency can be linked to its structure. Long, big meetings are usually a consequence of a strong hierarchical organization, where work is done by employees, then synced to a centralized node (a boss), and then possibly recapitulated to the workers in a meeting. However, are there better ways to structure an organization that would save time?

Read more…

How to guess a Kanji's on-yomi in 4 easy steps

Lately I have been putting a lot of effort into studying Japanese, to prepare for my upcoming trip to Japan at the end of summer 2018. While learning the readings for various kanji (Chinese character) words, I've noticed that a lot of their pronunciations (on-yomi) are related to the Chinese pronunciations in an interesting way. Let me explain my empirical theory about how to to convert Chinese pronunciations into Japanese ones:

Read more…