I generated 1.1TB of string data for a project, overnight. It’s just one big text file on a disk. Now I just have to grep through it to find the particular patterns I need… that 1.1TB will probably come down to 500-600GB by the end of it, but I can see the pattern-matching process taking the rest of the weekend…

Python and command-line utilities have been super useful at generating this data, and definitely helped the process along. As a reminder to myself, these are the commands I’m using to “post-process” the data:

Look for lines in input.csv which don’t match this pattern, and echo them to output.csv:

$ grep -vE "([A-K]{3}),\1" input.csv > output.csv

Split output.csv into files 800MB in size, called data_n, where n is an 8-digit incremental number (e.g. data_00000001):

$ split -a 8 -d -b 800M output.csv data_

For each data file in the directory, give it the .csv extension:

$ for f in data*; mv "$f" "$f.csv"; done
Jodi on Twitter (Twitter)
“Pia Klemp, the German ship captain who rescued migrants in the Mediterranean, as she refuses a medal from the mayor of Paris.”

“I’m not a humanitarian. I am not there to ‘aid’. I stand in solidarity. We do not need medals. We do not need authorities deciding who is a ‘hero’ and who is ‘illegal’. In fact they are in no position to make this call, because we are all equal.

“What we need are freedom and rights. It is time we call out hypocrite honouring and fill the void with social justice. It is time we cast all medals into spearheads of revolution!

“Documents and housing for all!

“Freedom of movement and residence!”

Pia Klemp, August 2019

Cool. Sometime recently, Flickr migrated their signup process to their own native system. I’ve been curious about Flickr under its new owners, but as recently as a couple of months ago, signing up still required creating a Yahoo account — something there was zero chance of me doing.

I’m not sure what I’d use Flickr for nowadays, but I do have fond memories of the service in its heyday of the mid-2000s. I can syndicate from this site to my Flickr photostream, although that means I’ll probably have to start using the full-sized original images in posts instead of the resized versions I use currently.

For whatever reason, I’m finding I’m just not “feeling it” with any of the miniatures I’ve been painting recently… not so much with the models themselves, more with the results of my efforts. I know I can often be overly critical of my own work, but this feels different to usual

I’ve been experimenting with using Python to generate text-based data for an experimental spin-off app from our team at work, and for my first “real world” use of Python, I’m pretty impressed with how efficient it is for doing this.

I’ve got a simple script iterating over a collection of strings to produce all possible combinations of those strings. The output of that script is being fed into a text file via Bash. So far it’s generating ~52GB of data in roughly 15 minutes, and it’s only part-way through the possible combinations. I’ve had to kill my test run because otherwise I’m going to run out of disk space on my laptop SSD! CPU usage was a moderate 26%, and RAM usage was tiny, at only ~2.8MB. Previous attempts at this using other languages tended to saturate one or both of these resources in fairly short order.

It’s fun to try out a new (to me) tool every now and then!