For some folks, programming is just a job. I have nothing against this; certainly, it is laudable to try and separate your profession from your true interests. I cannot, however, understand programmers who do not even try to automate their way out of problems. I thought we were supposed to be lazy!

The event that pissed me off

A friend was asked to compile information about an article from a large text file provided by a "programmer" working at the company. The guy had very specific expectations of the format that the information should be put into, almost as if it was going to be input into a different program.

His imagined workflow was to provide a list of IDs for which the information was needed, text search through the file (for example via Ctrl-f) for each matching ID, and then manually write down the information in the requested format. The zinger? The text file was in fact JSON, which seemed to be the direct result of an API call!

To summarize, this programmer:

  • got some information from an API
  • the information was formatted as JSON
  • ...the native data format for Javascript (this person was a web dev)
  • the output also needed to be parsed by a program

And decided the best thing to do was to hand it off to a non-programmer, who would have to do the whole thing manually! There were around 30 IDs, not that many in the grand scheme of things, but still tedious.

I decide to automate it

Even before knowing the input was JSON, the exact output format requested tipped me off that there might be some scope to script some of it with some sed and awk, which was the reason I was interested in the problem in the first place. Now that we know it's actually JSON, we can simply parse it using a library!

It only took 60 lines of python code, which, being mostly exploratory in nature, isn't bad! The one other thing I did was to use Emacs keyboard macros in order to lightly edit the data to be a python dictionary, which I use in the code below.

The program

It's a bit messy, because the input data was messy as well. The numbers dict is prepared using emacs macros rather than using python itself to load from a file. This is completely fine for one-off scripts!

You don't actually have to read this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import json
import re

def transform(inp):
    out = inp.split("-")
    num = out[1]
    if len(num) < 4:
        out = "{} 0{}".format(out[0], num)
    else:
        out = "{} {}".format(out[0], num)
    return out

with open("book_details.txt", "r") as file:
    inp = json.load(file)

// this is the format
numbers = {
    "DE-546": "title"
}

for i in inp:
    number = None
    if "__EMPTY_12" in i:
        if i["__EMPTY_12"] not in numbers:
            continue
        else:
            orig = i["__EMPTY_12"]
            orig = transform(orig)
            number = "{} {}".format(orig, numbers[i["__EMPTY_12"]])
    elif "__EMPTY_11" in i:
        if i["__EMPTY_11"] not in numbers:
            continue
        else:
            orig = i["__EMPTY_11"]
            orig = transform(orig)
            number = "{} {}".format(orig, numbers[i["__EMPTY_11"]])
    else:
        continue
    print(number)
    for key, value in i.items():
        if "EMPTY" not in key:
            title = value
            print(title)
    if "__EMPTY" in i:
        author = i["__EMPTY"].strip()
        print(author)
    if "__EMPTY_3" in i:
        language = i["__EMPTY_3"].strip()
        print(language)
    if "__EMPTY_10" in i:
        education = i["__EMPTY_10"]
        if "mb" in education.lower():
            education = i.get("__EMPTY_9", "empty")
        education = re.sub(r"[/] *", "-", education)
        print(education.strip())
    if "__EMPTY_2" in i:
        daisy = i["__EMPTY_2"]
        print(" ".join(daisy.split(" ")[:2]))
    print()