Age at creation for programming languages stats

Introduction

In this blog post (notebook) we ingest programming languages creation data from Programming Language DataBase” and visualize several statistics of it.

We do not examine the data source and we do not want to reason too much about the data using the stats. We started this notebook by just wanting to make the bubble charts (both 2D and 3D.) Nevertheless, we are tempted to say and justify statements like:

  • Pareto holds, as usual.
  • Language creators tend to do it more than once.
  • Beware the Second system effect.

References

Here are reference links with explanations and links to dataset files:


Data ingestion

Here we get the TSC file with Wolfram Function Repository (WFR) function ImportCSVToDataset:

url = "https://pldb.io/posts/age.tsv";
dsData = ResourceFunction["ImportCSVToDataset"][url, "Dataset", "FieldSeparators" -> "\t"];
dsData[[1 ;; 4]]

Here we summarize the data using the WFR function RecordsSummary:

ResourceFunction["RecordsSummary"][dsData, "MaxTallies" -> 12]

Here is a list of languages we use to “get orientated” in the plots below:

lsFocusLangs = {"C++", "Fortran", "Java", "Mathematica", "Perl 6", "Raku", "SQL", "Wolfram Language"};

Here we find the most important tags (used in the plots below):

lsTopTags = ReverseSortBy[Tally[Normal@dsData[All, "tags"]], Last][[1 ;; 7, 1]]

(*{"pl", "textMarkup", "dataNotation", "grammarLanguage", "queryLanguage", "stylesheetLanguage", "protocol"}*)

Here we add the column “group” based on the focus languages and most important tags:

dsData = dsData[All, Append[#, "group" -> Which[MemberQ[lsFocusLangs, #id], "focus", MemberQ[lsTopTags, #tags], #tags, True, "other"]] &];

Distributions

Here are the distributions of the variables/columns:

  • age at creation
    • i.e. “How old was the creator?”
  • appeared”
    • i.e. “In what year the programming language was proclaimed?”
Association @ Map[# -> Histogram[Normal@dsData[All, #], 20, "Probability", Sequence[ImageSize -> Medium, PlotTheme -> "Detailed"]] &, {"ageAtCreation", "appeared"}]

Here are corresponding Box-Whisker plots together with tables of their statistics:

aBWCs = Association@
Map[# -> BoxWhiskerChart[Normal@dsData[All, #], "Outliers", Sequence[BarOrigin -> Left, ImageSize -> Medium, AspectRatio -> 1/2, PlotRange -> Full]] &, {"ageAtCreation", "appeared"}];

Pareto principle manifestation

Number of creations

Here is the Pareto principle plot of for the number of created (or renamed) programming languages per creator (using the WFR function ParetoPrinciplePlot):

ResourceFunction["ParetoPrinciplePlot"][Association[Rule @@@ Tally[Normal@dsData[All, "creators"]]], ImageSize -> Large]

We can see that ≈25% of the creators correspond to ≈50% of the languages.

Popularity

Obviously, programmers can and do use more than one programming language. Nevertheless, it is interesting to see the Pareto principle plot for the languages “mind share” based on the number of users estimates.

ResourceFunction["ParetoPrinciplePlot"][Normal@dsData[Association, #id -> #numberOfUsersEstimate &], ImageSize -> Large]

Remark: Again, the plot above is “wrong” — programmers use more than one programming language.


Correlations

In order to see meaningful correlation, pairwise plots we take logarithms of the large value columns:

dsDataVar = dsData[All, {"appeared", "ageAtCreation", "numberOfUsersEstimate", "numberOfJobsEstimate", "rank", "measurements", "pldbScore"}];
dsDataVar = dsDataVar[All, Append[#, <|"numberOfUsersEstimate" -> Log10[#numberOfUsersEstimate + 1], "numberOfJobsEstimate" -> Log10[#numberOfJobsEstimate + 1]|>] &];

Remark: Note that we “cheat” by adding 1 before taking the logarithms.

We obtain the tables of correlations plots using the newly introduced, experimental PairwiseListPlot. If we remove the rows with zeroes some of the correlations become more obvious. Here is the corresponding tab view of the two correlation tables:

TabView[{
"data" -> PairwiseListPlot[dsDataVar, PlotTheme -> "Business", ImageSize -> 800],
"zero-free data" -> PairwiseListPlot[dsDataVar[Select[FreeQ[Values[#], 0] &]], PlotTheme -> "Business", ImageSize -> 800]}]

Remark: Given the names of the data columns and the corresponding obvious interpretations we can say that the stronger correlations make sense.


Bubble chart 2D

In this section we make an informative 2D bubble chart with (tooltips).

First, note that not all triplets of “appeared”,”ageAtCreation”, and “numberOfUsersEstimate” are unique:

ReverseSortBy[Tally[Normal[dsData[All, {"appeared", "ageAtCreation", "numberOfUsersEstimate"}]]], Last][[1 ;; 3]]

(*{{<|"appeared" -> 2017, "ageAtCreation" -> 33, "numberOfUsersEstimate" -> 420|>, 2}, {<|"appeared" -> 2023, "ageAtCreation" -> 39, "numberOfUsersEstimate" -> 11|>, 1}, {<|"appeared" -> 2022, "ageAtCreation" -> 55, "numberOfUsersEstimate" -> 6265|>, 1}}*)

Hence we make two datasets: (1) one for the core bubble chart, (2) the other for the labeling function:

aData = GroupBy[Normal@dsData, #group &, KeyTake[#, {"appeared", "ageAtCreation", "numberOfUsersEstimate"}] &];
aData2 = GroupBy[Normal@dsData, #group &, KeyTake[#, {"appeared", "ageAtCreation", "numberOfUsersEstimate", "id", "creators"}] &];

Here is the labeling function (see the section “Applications” of the function page of BubbleChart):

Clear[LangLabeler];
LangLabeler[v_, {r_, c_}, ___] := Placed[Grid[{
{Style[aData2[[r, c]]["id"], Bold, 12], SpanFromLeft},
{"Creator(s):", aData2[[r, c]]["creators"]},
{"Appeared:", aData2[[r, c]]["appeared"]},
{"Age at creation:", aData2[[r, c]]["ageAtCreation"]},
{"Number of users:", aData2[[r, c]]["numberOfUsersEstimate"]}
}, Alignment -> Left], Tooltip];

Here is the bubble chart:

BubbleChart[
aData,
FrameLabel -> {"Age at Creation", "Appeared"},
PlotLabel -> "Number of users estimate",
BubbleSizes -> {0.05, 0.14},
LabelingFunction -> LangLabeler,
AspectRatio -> 1/2.5,
ChartStyle -> 7,
PlotTheme -> "Detailed",
ChartLegends -> {Keys[aData], None},
ImageSize -> 1000
]

Remark: The programming language J is a clear outlier because of creators’ ages.


Bubble chart 3D

In this section we a 3D bubble chart.

As in the previous section we define two datasets: for the core plot and for the tooltips:

aData3D = GroupBy[Normal@dsData, #group &, KeyTake[#, {"appeared", "ageAtCreation", "measurements", "numberOfUsersEstimate"}] &];
aData3D2 = GroupBy[Normal@dsData, #group &, KeyTake[#, {"appeared", "ageAtCreation", "measurements", "numberOfUsersEstimate", "id", "creators"}] &];

Here is the corresponding labeling function:

Clear[LangLabeler3D];
LangLabeler3D[v_, {r_, c_}, ___] := Placed[Grid[{
{Style[aData3D2[[r, c]]["id"], Bold, 12], SpanFromLeft},
{"Creator(s):", aData3D2[[r, c]]["creators"]},
{"Appeared:", aData3D2[[r, c]]["appeared"]},
{"Age at creation:", aData3D2[[r, c]]["ageAtCreation"]},
{"Number of users:", aData3D2[[r, c]]["numberOfUsersEstimate"]}
}, Alignment -> Left], Tooltip];

Here is the 3D chart:

BubbleChart3D[
aData3D,
AxesLabel -> {"appeared", "ageAtCreation", "measuremnts"},
PlotLabel -> "Number of users estimate",
BubbleSizes -> {0.02, 0.07},
LabelingFunction -> LangLabeler3D,
BoxRatios -> {1, 1, 1},
ChartStyle -> 7,
PlotTheme -> "Detailed",
ChartLegends -> {Keys[aData], None},
ImageSize -> 1000
]

Remark: In the 3D bubble chart plot “Mathematica” and “Wolfram Language” are easier to discern.


Second system effect traces

In this section we try — and fail — to demonstrate that the more programming languages a team of creators makes the less successful those languages are. (Maybe, because they are more cumbersome and suffer the Second system effect?)

Remark: This section is mostly made “for fun.” It is not true that each sets of languages per creators team is made of comparable languages. For example, complementary languages can be in the same set. (See, HTTP, HTML, URL.) Some sets are just made of the same language but with different names. (See, Perl 6 and Raku, and Mathematica and Wolfram Language.) Also, older languages would have the First mover advantage.

Make creators to index association:

aCreators = KeySort@Association[Rule @@@ Select[Tally[Normal@dsData[All, "creators"]], #[[2]] > 1 &]];
aNameToIndex = AssociationThread[Keys[aCreators], Range[Length[aCreators]]];

Make a bubble chart with relative popularity per creators team:

aNUsers = Normal@GroupBy[dsData, #creators &, (m = Max[1, Max[Sqrt@KeyTake[#, "numberOfUsersEstimate"]]]; Map[Tooltip[{#appeared, #creators /. aNameToIndex, Sqrt[#numberOfUsersEstimate]/m}, Grid[{{Style[#id, Black, Bold], SpanFromLeft}, {"Creator(s):", #creators}, {"Users:", #numberOfUsersEstimate}}, Alignment -> Left]] &, #]) &];
aNUsers = KeySort@Select[aNUsers, Length[#] > 1 &];
BubbleChart[aNUsers, AspectRatio -> 2, BubbleSizes -> {0.02, 0.05}, ChartLegends -> Keys[aNUsers], ImageSize -> Large, GridLines -> {None, Values[aNameToIndex]}, FrameTicks -> {{Reverse /@ (List @@@ Normal[aNameToIndex]), None}, {Automatic, Automatic}}]

From the plot above we cannot decisively say that:

The most recent creation of a team of programming language creators is not team's most popular creation.

That statement, though, does hold for a fair amount of cases.


Instead of conclusions

Consider:

  • Making an interactive interface for the variables, types of plots, etc.
  • Placing callouts for the focus languages in bubble charts.

Nightcore “Conflict” video making

Introduction

This notebook shows how to make Nightcore modifications to the animation video, “Conflict” (1983), Soyuzmultfilm.

Remark: In Russian: “Конфликт”, (1983), Союзмультфилм .

Remark: We use “Conflict” since its licencing allows copies of it to be (publicly) played via YouTube.

Remark: The notebook follows closely a previous post of mine about making Nightcore version of the song “Schweine”.

The Nightcore transformation of the video was fairly straightforward with Mathematica / WL. The video transformation and combination are also fairly straightforward or easy.

Remark: Here is the final result uploaded to YouTube: https://youtu.be/C2TtkKfQa9I

Get movies

Here are links to that video:

I downloaded the videos from after searching yandex.ru (dzen.ru). Alternatively, one can find and download videos in Firefox or Google Chrome via relevant plugins. (Or use VLC; or utilize the paclet described in the post “Playing with YouTube from Mathematica”, [BMI1].)

At this point I have a small official video and larger one. This gives the opportunity to demonstrate transferring of the “Dolphin” signature from the “official” video to the larger one. (See the frame manipulation below.)

Here we import the downloaded video:

vdConflict0 = Import["~/Downloads/Conflict-Soviet-Animation.mp4"]

Make Nightcore audio

The process for making a song to be in the Nightcore style is described in Wikipedia, [Wk1]. Basically, we just make the tempo 20-30% faster and raise the pitch with, $\approx 5.5$ semitones.

Remark: An alternative of the process shown in this section is to use audio transformation programs like Audacity and AmadeusPro.

Here we get the audio from the video:

auConflict = Audio[vdConflict0]

Here we change the tempo to be 20% faster:

AbsoluteTiming[ auConflictFaster = AudioTimeStretch[auConflict, 1/1.2] ]

Here we raise the pitch with $5.5$ semitones:

AbsoluteTiming[ auConflictNightcore = AudioPitchShift[auConflictFaster, Quantity[5.5, "Semitones"]] ]

Direct video styling

If we only wanted to change how the video looks we can directly manipulate the video frames with VideoFrameMap, [WRI6] :

AbsoluteTiming[ k = 0; vdConflict4 = VideoFrameMap[ImageEffect[#, "Tritanopia"] &, vdConflict0]; ] (*{75.6817, Null}*)

vdConflict4

Remark: Since we want to make both the audio and video shorter we have to use video frames.

Make Nightcore video

Get the frames of the video:

AbsoluteTiming[ lsFrames = VideoExtractFrames[vdConflict0, All]; ] (*{2.65153, Null}*)

Show the number of frames:

Length[lsFrames] (*10730*)

Generate (audio-less) video from the list of frames that have the same length as the generated audio:

AbsoluteTiming[ vdConflictNew = VideoGenerator[lsFrames, Duration[auConflictNightcore]]; ] (*{56.1004, Null}*)

Combine the video and audio (into a new video):

AbsoluteTiming[ vdConflictNightcore = VideoCombine[{vdConflictNew, auConflictNightcore}]; ] (*{0.144603, Null}*)

Here is the result:

vdConflictNightcore

Remark: Here we do not export the video, since Mathematica saves it in a “standard” location of the host operating system.

Cutting-off and re-splicing the movie credits

In order to engage better people from the Millennials and Gen Z generational cohorts, I want to move the movie credits from the start of the movie to be at the end. We use the function VideoSplit, [WRI10], and VideoJoin, [WRI11].

Here we show frames that indicate where to split the obtained Nightcore movie:

VideoExtractFrames[vdConflictNightcore, {1, 30, 36, 37, 38}]

Here we split the video:

{v1, v2} = VideoSplit[vdConflictNightcore, 37];

In order to have a better visual flow we color-invert the frames of the credits part:

v1Negative = VideoFrameMap[ColorNegate, v1];

Here we splice the “main” part with the “negated” credits part:

vdConflictNightcore2 = VideoJoin[v2, v1Negative]

References

[BMA1] b3m2ma1, “Playing with YouTube from Mathematica”, (2018), Wolfram Community. (GitHub link.)

[Wk1] Wikipedia entry, “Nightcore”.

[WRI1] Wolfram Research (2010), TextRecognize, Wolfram Language function, https://reference.wolfram.com/language/ref/TextRecognize.html (updated 2020).

[WRI2] Wolfram Research (2016), Audio, Wolfram Language function, https://reference.wolfram.com/language/ref/Audio.html (updated 2020).

[WRI3] Wolfram Research (2016), AudioTimeStretch, Wolfram Language function, https://reference.wolfram.com/language/ref/AudioTimeStretch.html (updated 2020).

[WRI4] Wolfram Research (2016), AudioPitchShift, Wolfram Language function, https://reference.wolfram.com/language/ref/AudioPitchShift.html (updated 2020).

[WRI5] Wolfram Research (2020), VideoExtractFrames, Wolfram Language function, https://reference.wolfram.com/language/ref/VideoExtractFrames.html.

[WRI6] Wolfram Research (2020), VideoFrameMap, Wolfram Language function, https://reference.wolfram.com/language/ref/VideoFrameMap.html (updated 2021).

[WRI7] Wolfram Research (2008), ImageEffect, Wolfram Language function, https://reference.wolfram.com/language/ref/ImageEffect.html (updated 13).

[WRI8] Wolfram Research (2020), VideoGenerator, Wolfram Language function, https://reference.wolfram.com/language/ref/VideoGenerator.html (updated 2021).

[WRI9] Wolfram Research (2020), VideoCombine, Wolfram Language function, https://reference.wolfram.com/language/ref/VideoCombine.html.

[WRI10] Wolfram Research (2020), VideoSplit, Wolfram Language function, https://reference.wolfram.com/language/ref/VideoSplit.html.

[WRI11] Wolfram Research (2020), VideoJoin, Wolfram Language function, https://reference.wolfram.com/language/ref/VideoJoin.html.

Wolfram Language conference in St. Petersburg

Two weeks ago (June 1st and 2nd) I participated in the Wolfram Language conference in St. Petersburg, Russia.
Here are the corresponding announcements:

The conference was co-organized by Kirill Belov and Petr Genadievich Tenishev.

Link to the GitHub repository with my conference presentation materials.

Here is a mind-map of the potential presentations Kirill and I discussed:

System dynamics presentation

I presented “Динамические системы : Расширение и улучшение эпидемиологических моделей”
(in English: “Dynamics systems: extending and improving epidemiological models”.)

The main presentation slides had a dozen supplements:

Making two presentations

Interestingly, I first prepared a Latent Semantic Analysis (LSA) talk, but then found out that the organizers listed another talk I discussed with them, on extending dynamic systems models. (The latter one is the first we discussed, so, it was my “fault” that I wanted to talk about LSA.)

Here are the presentation notebooks for LSA in English and Russian .

Some afterthoughts

  • Тhe conference was very “strong”, my presentation was the “weakest.”
    • With “strong” I refer to the content and style of the presentations.
  • This was also the first scientific presentation I gave in Russian. I also got a participation diploma .

to demonstrate generation of epidemiological modeling code.

  • Preparing for the conference reminded me of some the COVID-19 modeling hackathons I participated in.
  • I prepared the initial models of artillery shells manufacturing, but much more work has to be done in order to have a meaningful article or presentation. (Hopefully, I am finishing that soon.)

References

Articles, posts, presentations

[AA1] Антон Антонов,
“Динамические системы : Расширение и улучшение эпидемиологических моделей” .

[AA2] Антон Антонов,
“COVID-19 modeling over Botswana” ,

[AA3] Anton Antonov,
“WirVsVirus 2020 hackathon participation” ,
(2020),
MathematicaForPrediction at WordPress .