Coding python packages with AI

I tried using some new LLM tools to code 2 entire python packages (instead of editing a handful of lines at a time, which is what I did previously). It went well! These tools are not perfect, but they are useful!

Read more…

Why your active learning algorithm may not do better than random

I am a big fan of active learning, but I am also acutely aware of its potential failure modes. A common failure mode is random-like performance: achieving no better "success"1 in picking points than a random policy. Indeed, it is possible to experience this when the implementation is flawed.2 However, in some problems it may not be possible to beat random-like performance. In this post I try to explain why.

Read more…

Using LLMs to improve my Chinese

I've been learning Chinese for almost 10 years now, but still make awkward-sounding sentences when I speak. A few months ago I thought "why not use LLMs to help me speak more naturally", and found that it does not take much prompting to get useful feedback. Here is a conversation with Claude 3.5 from a few months ago:

Read more…

Conceptual confusion about desirable outputs of reaction prediction models.

In the literature about machine learning for retrosynthesis, one line of work tries to predict chemical reactions, either in the forward direction (ie what products will A + B form) or in the backward direction (ie what reactants could react to produce molecule C). Such models are often trained on datasets of known reactions like Pistachio or USPTO, with the hope of generalizing to new "correct" reactions. However, this formulation of the problem overlooks a lot of subtleties about what makes a reaction "correct". In this post I will present a more nuanced mental model which (hopefully) clarifies some ambiguities.

Read more…

Punishing poor reviewers at CVPR

This year CVPR pledged to make all authors participate in peer review, and reject papers from authors who wrote low-quality reviews.1 Last week they confirmed on Twitter that they followed through with this and rejected 19 papers. Presumably this is a tiny fraction of the overall papers submitted, but I hope this is an effective deterrent for future authors. At the very least, I'm glad that some major conference tried something like this!

Why don't ML conferences provide reviewer instructions?

I remember when I first received an invitation to review papers for an ML conference in late 2020. What surprised me most was not that I was being invited (even though that was a surprise, since I was just a second year PhD student who had only just completed writing a paper myself). Instead, it was the lack of instruction of how to assess the papers: essentially just "write your reviews by date X", and "evaluate novelty, significance, soundness, etc". In fact, in all the years since, I think I have never received explicit instructions for reviewing ML conference papers.1

Read more…

Alpha over LLMs

On a recent podcast, Patrick McKenzie mentioned the idea of "alpha over LLMs": does a publisher produce text with any meaningful advantage over asking an LLM? I think this is an important question for anybody trying to regularly write, even if the readership is small (eg this blog). I interpret this as:

  • People should not put out content which is obviously wrong and can be corrected by an LLM (eg "I have theory X" where asking an LLM provides clear and convincing counter-arguments to X).
  • People should also not put out content which is worse than the answer you get from asking an LLM (eg the same content but explained less clearly).

I will generally try to uphold this principle in future blog posts.

Is offline model-based optimization a realistic problem? (I'm not convinced)

This is a "quickpost": a post which I have tried to write quickly, without very much editing/polishing. For more details on quickposts, see this blog post.

Offline model-based optimization (OMBO in this post) is essentially 1-shot optimization using a fixed dataset. You see data, do whatever you want, then propose a batch of query points, which are then evaluated. Hopefully, one (or most) of the query points are optimal (or near optimal). End of task.

Read more…

Experiment: more posts, lower quality

Since starting my new position at Valence, my efforts to write more on my blog have clearly been successful.1 However, my internal list of "things I would like to write a blog post about" is growing far faster than I am actually able to write blog posts about things where I do think it is worth putting an opinion online.

Read more…