Programming Project Ideas

Keeping a log of the areas and ideas I find interesting and would like to work on.

  • Start learning math. A little every day. For the problems that I’d like to work on I need to know more math.
  • Entity resolution. I spent a couple of years building and iterating on an entity resolution pipeline. The whole experience was fascinating, and I learned a ton. So I want to keep working on it.
    • Lots to experiment with here:
      • Building entity resolution for different scales (i.e. 100 records vs. 100 million records)
      • Trying different approaches to ER
    • Blocking was always a problem that interested me. Lots of clever approaches here, and a lot to learn.
    • Analyzing the accuracy of entity resolution on very large datasets was an unsolved problem for us, and one I think I could make headway on.
    • My experience with entity resolution was in a distributed environment (Apache Spark), but I always wanted to try building ER to run on a single machine.
    • One-size-fits-all vs. a tailored approach. This also depends on how heterogeneous the data is.
  • Ethereum blockchain. I find it cool, don’t understand it at all, and would love to learn more about it. I’d like to try building some sort of blockchain project. Also, given the sheer amount of data available, I think this would be a rich vein to mine for both building analytics and scratching my itch for building ETL pipelines.
  • Machine learning and data science. This goes to the math thing above, too. Some really cool stuff here that I don’t understand at all.