<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Nitin Gupta</title>
    <link>https://nitingupta.dev/</link>
    <description>Recent content on Nitin Gupta</description>
    <generator>Hugo</generator>
    <language>en</language>
    <managingEditor>ngupta@nitingupta.dev (Nitin Gupta)</managingEditor>
    <webMaster>ngupta@nitingupta.dev (Nitin Gupta)</webMaster>
    <lastBuildDate>Mon, 08 Dec 2025 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nitingupta.dev/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Tensor Deduplication for Multi-Model Inference</title>
      <link>https://nitingupta.dev/post/tensor-dedup/</link>
      <pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/tensor-dedup/</guid>
      <description>&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Multi-model workloads are the norm: A/B tests, customer fine-tunes, safety variants, multi-stage pipelines. GPU memory scales linearly with model count, and VRAM is the limiting resource.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Tensor deduplication automatically identifies and shares bit-identical weight tensors across models, requiring no checkpoint modifications.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Results&lt;/strong&gt;: Across diffusion and LLM workloads, real-world savings range from &lt;strong&gt;3–32%&lt;/strong&gt;. DeepFloyd IF stages share 18.87 GB (32% reduction). Synthetic upper bound is 50%.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Overhead&lt;/strong&gt;: Hashing adds &amp;lt;1% to model load time. Zero runtime overhead since the forward pass is unchanged.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Compatibility&lt;/strong&gt;: Works with HuggingFace safetensors, GGUF, and Diffusers pipelines. No changes to training or checkpoints required.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;multi-model-memory-bloat&#34;&gt;Multi-Model Memory Bloat&lt;/h2&gt;&#xA;&lt;p&gt;Modern inference deployments rarely serve a single model. Production systems routinely load:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Shared Backbones: Loading Weights Once, Serving Many Models</title>
      <link>https://nitingupta.dev/post/shared-backbones/</link>
      <pubDate>Sat, 29 Nov 2025 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/shared-backbones/</guid>
      <description>&lt;p&gt;I keep running into the same pattern when trying to self-host models (which is a lot of fun): we run several big models side by side, all of them valuable, all of them slightly different, and all of them wasting VRAM by reloading nearly the same weights.&lt;/p&gt;&#xA;&lt;p&gt;This post is my attempt to explore a specific idea:&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Can we load a shared backbone of weights once on a GPU, then load only the small, unique pieces per model that reuse that backbone?&lt;/strong&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Proactive Compaction</title>
      <link>https://nitingupta.dev/post/proactive-compaction/</link>
      <pubDate>Sat, 07 Mar 2020 22:33:52 -0800</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/proactive-compaction/</guid>
      <description>&lt;p&gt;&lt;em&gt;This feature has now been &lt;a href=&#34;https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/compaction.c?id=facdaa917c4d5a376d09d25865f5a863f906234a&#34;&gt;&lt;strong&gt;accepted and merged&lt;/strong&gt;&lt;/a&gt; in the upstream kernel and will be part of kernel release 5.9. This post has been updated to match the upstream version of this feature.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;p&gt;In my &lt;a href=&#34;https://nitingupta.dev/post/linux-kernel-hugepage-allocation-latencies/&#34;&gt;previous post&lt;/a&gt;, I described how on-demand compaction scheme hurts hugepage allocation latencies on Linux. To improve the situation, I have been working on Proactive Compaction for the Linux kernel, which tries to reduce higher-order allocation latencies by compacting memory in the background.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Linux kernel hugepage allocation latencies</title>
      <link>https://nitingupta.dev/post/linux-kernel-hugepage-allocation-latencies/</link>
      <pubDate>Tue, 04 Feb 2020 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/linux-kernel-hugepage-allocation-latencies/</guid>
      <description>&lt;p&gt;Some drivers needs to allocate almost all memory as hugepages to reduce (on-device or CPU) TLB pressure. However, on a running system, higher order allocations can fail if the memory is fragmented. Linux kernel can do &lt;strong&gt;on-demand compaction&lt;/strong&gt; as we request more hugepages but this style of compaction incurs very high latency.&lt;/p&gt;&#xA;&lt;p&gt;To show the effect of on-demand compaction on hugepage allocation latency, I created a test program &amp;ldquo;frag&amp;rdquo; which allocates almost all available system memory followed by freeing $\frac{3}{4}$ of pages from each hugepage-sized aligned chunk. This allocation pattern results in ~300% fragmented address space w.r.t order 9 i.e. physical mappings of our VA space is spread over 3x the number of hugepage-aligned chunks than what is ideally required.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A layered object store design in Elixir (Part VI)</title>
      <link>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part6/</link>
      <pubDate>Fri, 24 Jan 2020 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part6/</guid>
      <description>&lt;p&gt;We built an object store from scratch in Elixir using a layered design approach. The overall theme has been to avoid generalizing the design too much which kept implementation of each layer/module simple. We were also careful when adding any third-party dependencies which has multiple advantages: deeper understanding of your codebase, easier debugging (I hate unknown code-paths in backtraces).&lt;/p&gt;&#xA;&lt;p&gt;For reference, here are links for all five parts along with their summaries:&lt;/p&gt;</description>
    </item>
    <item>
      <title>A layered object store design in Elixir (Part V)</title>
      <link>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part5/</link>
      <pubDate>Thu, 23 Jan 2020 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part5/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part1/&#34;&gt;Part I&lt;/a&gt;, introduces the overall design of our object store. In this post we focus on the Web layer. This is the final layer for our object store responsible for exposing it over the web. It will expose endpoints: &lt;code&gt;/upload&lt;/code&gt; for uploading a file and &lt;code&gt;/file/:file_id&lt;/code&gt; for getting a file by ID. A typical GraphQL application with also expose endpoint &lt;code&gt;/graphql&lt;/code&gt; which directly plugs into your API layer, however I will not discuss this part and stay focused on the object store side of things.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A layered object store design in Elixir (Part IV)</title>
      <link>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part4/</link>
      <pubDate>Wed, 22 Jan 2020 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part4/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part1/&#34;&gt;Part I&lt;/a&gt;, introduces the overall design of our object store. In this post we focus on the API layer. All layers till now were just concerned about storing the input file together with some file-format specific transforms (like thumbnails). It is at the API layer where we will be storing per-file system and user metadata. This metadata can be used to support application specific business logic and security policies.&lt;/p&gt;&#xA;&lt;p&gt;This layer will depend on all per-file-format modules: &lt;code&gt;ImageStore&lt;/code&gt;, &lt;code&gt;VideoStore&lt;/code&gt;, etc. We will use Postgres for storing per-file metadata, so we also depend on the &lt;a href=&#34;https://hex.pm/packages/postgrex&#34;&gt;postgrex&lt;/a&gt; package. A typical API layer will also be exposing a GraphQL interface which forms the core of application specific business logic. I am not going to include an example GraphQL interface here but &lt;a href=&#34;https://hex.pm/packages/absinthe&#34;&gt;absinthe&lt;/a&gt; would be my preferred way of doing it, anytime.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A layered object store design in Elixir (Part III)</title>
      <link>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part3/</link>
      <pubDate>Mon, 20 Jan 2020 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part3/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part2/&#34;&gt;Part I&lt;/a&gt; layer.&lt;/p&gt;&#xA;&lt;h1 id=&#34;imagestore&#34;&gt;ImageStore&lt;/h1&gt;&#xA;&lt;p&gt;The ImageStore module is responsible for storing images along with their thumbnail. It will use the FileStore layer to actually store files on disk. Before we define module interfaces, lets see our application requirements:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;All images must be stored in the &lt;code&gt;jpg&lt;/code&gt; format.&lt;/li&gt;&#xA;&lt;li&gt;Images cannot be larger than 1920x1080. We do not want to store user provided version at all.&lt;/li&gt;&#xA;&lt;li&gt;Thumbnails should use the same &lt;code&gt;jpg&lt;/code&gt; format.&lt;/li&gt;&#xA;&lt;li&gt;All thumbnails must have the same size of 256x256.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Note that we are going for highly application specific requirements rather than a more general, configurable design. I have seen most of the complexity in software stacks is due to the temptation of making them &amp;ldquo;reusable&amp;rdquo;. As you will see, the implementation is going to be so simple, with clearly defined interfaces, that it would be much easier for you to create such a module for each of your applications, with its specific requirements baked in.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A layered object store design in Elixir (Part II)</title>
      <link>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part2/</link>
      <pubDate>Mon, 13 Jan 2020 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part2/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part1/&#34;&gt;Part I&lt;/a&gt;, introduces the overall design of our object store. In this post we focus on its first layer, the &lt;code&gt;FileStore&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;p&gt;The FileStore layer is responsible for actually storing the file in our object store. At this level, we are not concerned about what kind of file it is (image, video, document, or whatever else), nor do we have any notion of security. We just store whatever input path is given to us.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A layered object store design in Elixir (Part I)</title>
      <link>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part1/</link>
      <pubDate>Sun, 12 Jan 2020 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/a-layered-object-store-design-in-elixir-part1/</guid>
      <description>&lt;p&gt;I recently designed an object store from scratch in Elixir. It has been serving me well as a backend for an app which needs to store all kinds of files: images, videos, documents. I wanted something simple to avoid dealing with off-the-shelf object stores which require complex configurations and to avoid cloud storage which is dead simple to use but can get very expensive, very quickly. For this project, simplicity was the key to make sure I can debug any failures quickly.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Elixir collections</title>
      <link>https://nitingupta.dev/post/elixir-collections/</link>
      <pubDate>Thu, 02 Jan 2020 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/elixir-collections/</guid>
      <description>&lt;p&gt;Elixir is a function programming language that I have been using a lot in recent months to build all kinds of applications. Understanding of built-in collection types is essential to use any language effectively and Elixir is no different.&lt;/p&gt;&#xA;&lt;p&gt;This posts summarizes all collection type along with pros/cons/gotchas for each one of them.&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;&lt;strong&gt;Collection&lt;/strong&gt;&lt;/th&gt;&#xA;          &lt;th&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/th&gt;&#xA;          &lt;th&gt;&lt;strong&gt;When&lt;/strong&gt;&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Tuples&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;{:ok, &amp;quot;All good&amp;quot;}&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;Returning data from a function&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Lists&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;[1, &amp;quot;two&amp;quot;, :three]&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;For a collection of items&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Keyword lists&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;[one: 1, two: 2]&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;Passing options to a function&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Maps&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;%{one: 1, two: 2}&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;Flexible key/value store&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Structs&lt;/td&gt;&#xA;          &lt;td&gt;&lt;code&gt;%User{name: &amp;quot;John&amp;quot;, age: 32}&lt;/code&gt;&lt;/td&gt;&#xA;          &lt;td&gt;Typed/fixed key/value store&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;h1 id=&#34;tuples&#34;&gt;Tuples&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;{:ok, foo}&lt;/code&gt; &amp;ldquo;tagged&amp;rdquo; tuple since begins with an atom like &lt;code&gt;:ok&lt;/code&gt; or &lt;code&gt;:error&lt;/code&gt; like &lt;code&gt;{:error, 543, &amp;quot;some error&amp;quot;}&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Examples:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Subtle Errors in C&#43;&#43; Programs</title>
      <link>https://nitingupta.dev/post/subtle-errors-in-cpp-programs/</link>
      <pubDate>Wed, 24 Apr 2019 15:21:51 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/subtle-errors-in-cpp-programs/</guid>
      <description>&lt;p&gt;I recently stumbled upon a subtle bug in a benchmark &lt;a href=&#34;https://gist.github.com/apurvam/6803958&#34;&gt;code&lt;/a&gt; which again reminds me to never use C++ again, if I can.&lt;/p&gt;&#xA;&lt;p&gt;Here&amp;rsquo;s a buggy snippet from this code (simplified):&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-c++&#34; data-lang=&#34;c++&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;// BUGGY&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ostringstream os;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; i &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;os &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;foo-&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&amp;lt;&lt;/span&gt; i &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.dat&amp;#34;&lt;/span&gt;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;filename &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; os.str().c_str();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; fd &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; open(filename, O_RDONLY);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You may expect above code to try open a file named &lt;code&gt;foo-1.dat&lt;/code&gt; but that&amp;rsquo;s not what is happening here.&lt;/p&gt;&#xA;&lt;p&gt;In this snippet, &lt;code&gt;os.str()&lt;/code&gt; create a temporary &lt;code&gt;string&lt;/code&gt; object which is destroyed immediately after call to &lt;code&gt;c_str()&lt;/code&gt; method. So, &lt;code&gt;filename&lt;/code&gt; ends up pointing to freed memory which can of course contain arbitrary content (till you reach a &lt;code&gt;NULL&lt;/code&gt;).&lt;/p&gt;</description>
    </item>
    <item>
      <title>Setting Up Backup Snapshots on Linux</title>
      <link>https://nitingupta.dev/post/setting-up-backup-snapshots-on-linux/</link>
      <pubDate>Wed, 24 Apr 2019 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/setting-up-backup-snapshots-on-linux/</guid>
      <description>&lt;p&gt;For some time I&amp;rsquo;ve been looking for a backup solution for Linux that can periodically take snapshots of data, allowing me to go back in history of any file just like git. I finally found &lt;a href=&#34;https://restic.net/&#34;&gt;restic&lt;/a&gt; which fits these requirements. Here is how I set it up to take snapshots of particular directories, say every 15 minutes.&lt;/p&gt;&#xA;&lt;h2 id=&#34;installing-restic&#34;&gt;Installing restic&lt;/h2&gt;&#xA;&lt;p&gt;Though restic is available in repositories of almost all Linux distros, I recommend downloading the &lt;a href=&#34;https://github.com/restic/restic/releases/&#34;&gt;latest release&lt;/a&gt; directly from GitHub to avoid dealing with potentially outdated version. Helpfully, you can stay current with restic release with &lt;code&gt;restic self-update&lt;/code&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Google Drive on Linux</title>
      <link>https://nitingupta.dev/post/google-drive-on-linux/</link>
      <pubDate>Mon, 22 Apr 2019 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/google-drive-on-linux/</guid>
      <description>&lt;p&gt;There is no official Google drive client for Linux. I tried many different clients found all over GitHub but none of them worked reliably for me except &lt;a href=&#34;https://github.com/ncw/rclone&#34;&gt;rclone&lt;/a&gt;. I also tried third-party proprietary clients like &lt;a href=&#34;https://www.insynchq.com/&#34;&gt;Insync&lt;/a&gt; but allowing read-write access to all your Google drive files to a closed source blob is too much to swallow.&lt;/p&gt;&#xA;&lt;p&gt;Once caveat with &lt;code&gt;rclone&lt;/code&gt; is that it does not natively support bi-directional sync (&lt;a href=&#34;https://github.com/ncw/rclone/issues/118&#34;&gt;github issue&lt;/a&gt;) but someone developed a python script &lt;a href=&#34;https://github.com/cjnaz/rclonesync-V2&#34;&gt;rclonesync-V2&lt;/a&gt; which is a wrapper around &lt;code&gt;rclone&lt;/code&gt; which does the job. With these two pieces of software we can get close-to-official Google drive client experience.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Faster compilation with distcc</title>
      <link>https://nitingupta.dev/post/faster-compilation-with-distcc/</link>
      <pubDate>Sat, 19 Jun 2010 05:22:00 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/faster-compilation-with-distcc/</guid>
      <description>&lt;p&gt;Often, you have more than one system at your disposal but no clear way&#xA;of distributing your compilation workloads over to all or some of them.&#xA;They might be running different OSes which makes it look even more&#xA;difficult. In my case, I have one laptop (2 cores) and a desktop (4&#xA;cores) connected with a WiFi network. The laptop runs Linux (Fedora 13&#xA;64-bit) while the desktop runs Windows 7 (64-bit). I wanted to somehow&#xA;offload Linux kernel compilation over to my powerful desktop and keep my&#xA;laptop cool :)&lt;/p&gt;</description>
    </item>
    <item>
      <title>Compressed RAM disk for Windows, The Virtual Way!</title>
      <link>https://nitingupta.dev/post/compressed-ram-disk-for-windows-the-virtual-way/</link>
      <pubDate>Sun, 30 May 2010 06:53:00 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/compressed-ram-disk-for-windows-the-virtual-way/</guid>
      <description>&lt;p&gt;Recently, I developed Linux kernel driver which creates generic RAM&#xA;based compressed block devices (called &lt;strong&gt;zram&lt;/strong&gt;). Being RAM disks, they&#xA;do not provide persistent storage but there are many use cases where&#xA;persistence is not required: /tmp, various caches under /var, swap disks&#xA;etc. These cases can benefit greatly from high speed RAM disks along&#xA;with savings which compression brings!&lt;/p&gt;&#xA;&lt;p&gt;However, all this seems to be completely Linux centric. But with&#xA;virtualization, zram can be used for Windows too! The trick is a expose&#xA;zram as a ‘raw disk’ to Windows running inside a Virtual Machine (VM). I&#xA;will be using VirtualBox as example but exposing raw disks should be&#xA;supported by other Virtualization solutions like VMware, KVM too.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Comprehensive graphical Git diff viewer</title>
      <link>https://nitingupta.dev/post/comprehensive-graphical-git-diff-viewer/</link>
      <pubDate>Sun, 27 Dec 2009 06:28:00 -0800</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/comprehensive-graphical-git-diff-viewer/</guid>
      <description>&lt;p&gt;Since a long time, I was looking for a graphical git diff viewer which&#xA;could show original and modified file side-by-side and highlight the&#xA;changes. There are few solutions but none of them is sufficient:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;A tool included with git called &amp;lsquo;git-difftool&amp;rsquo; is partially helpful&#xA;&amp;ndash; it can show changes graphically but diff for each file is shown&#xA;one-by-one. This is very irritating. In fact, unusable even with&#xA;just 10-15 files.&lt;/li&gt;&#xA;&lt;li&gt;Another alternative is the &lt;a href=&#34;http://meld.sourceforge.net/&#34;&gt;meld diff&#xA;viewer&lt;/a&gt; which is &amp;ldquo;git aware&amp;rdquo;. The&#xA;problem here is that it can show diff for uncommitted changes only&#xA;which is very limiting. What if you want to see what changes between&#xA;Linux kernel, say 2.6.33-rc1 and 2.6.33-rc2? or changes between last&#xA;two commits? meld cannot do it, AFAIK.&lt;/li&gt;&#xA;&lt;li&gt;Finally, with &lt;a href=&#34;http://www.caffeinated.me.uk/kompare/&#34;&gt;kompare&lt;/a&gt;, you&#xA;can do something like: &amp;lsquo;git diff master | kompare -o -&amp;rsquo;. This method&#xA;however, does not show original and new files side-by-side. It is&#xA;simply prettier diff highlighting.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;None of above methods are sufficient. So, I wrote the following script&#xA;which solves our problem: show complete contents of original and new&#xA;files side-by-side and highlight the differences.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Linux kernel workflow with Git</title>
      <link>https://nitingupta.dev/post/linux-kernel-workflow-with-git/</link>
      <pubDate>Wed, 23 Sep 2009 10:21:00 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/linux-kernel-workflow-with-git/</guid>
      <description>&lt;p&gt;You worked on some part of Linux kernel. It works great. Now, how to&#xA;generate the patch series and send it out for review? For this, I always&#xA;used to generate diffs, create a set of draft mails (one for each patch)&#xA;in KMail or Thunderbird, and send all these mails one-by-one. This&#xA;workflow quickly became a big headache. Then I learned Git (and some&#xA;related tools) to do all this from command line and wow! what a&#xA;relief!&lt;/p&gt;</description>
    </item>
    <item>
      <title>ccache to speed-up Linux kernel compile</title>
      <link>https://nitingupta.dev/post/ccache-to-speed-up-linux-kernel-compile/</link>
      <pubDate>Thu, 19 Mar 2009 14:40:00 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/ccache-to-speed-up-linux-kernel-compile/</guid>
      <description>&lt;p&gt;In case you are unfamiliar with ccache, its a &amp;ldquo;compiler cache&amp;rdquo;.&#xA;Compiling is primarily CPU intensive task. So, ccache caches compiled&#xA;objects - so next time we compile same code, it reuses these objects&#xA;thereby &lt;span style=&#34;font-weight: bold;&#34;&gt;significantly&lt;/span&gt;&#xA;speeding-up compilation.&lt;/p&gt;&#xA;&lt;p&gt;I need to recompile Linux kernel usually several times a day, with&#xA;different permutations of config settings. This almost forces a &amp;lsquo;make&#xA;clean&amp;rsquo; or &amp;lsquo;make mrproper&amp;rsquo; which deletes all compiled objects in build&#xA;tree and then we have to rebuild everything all over again. This takes&#xA;enormous amount of time. ccache comes to rescue! I&amp;rsquo;m surprised why I&#xA;didn&amp;rsquo;t use it earlier.&lt;/p&gt;</description>
    </item>
    <item>
      <title>SLOB memory allocator</title>
      <link>https://nitingupta.dev/post/slob-memory-allocator/</link>
      <pubDate>Wed, 18 Mar 2009 18:08:00 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/slob-memory-allocator/</guid>
      <description>&lt;p&gt;Linux kernel has few SLAB allocator variants included: SLAB, SLUB and&#xA;SLOB. Of these, SLOB is especially meant to be used on embedded devices&#xA;&amp;ndash; it tries to be more memory space efficient than other SLAB&#xA;variants.&lt;/p&gt;&#xA;&lt;p&gt;Yesterday, I had a detailed look at SLOB allocator for possible use in&#xA;&lt;a href=&#34;http://code.google.com/p/compcache/&#34;&gt;compcache poject&lt;/a&gt; and found it&#xA;unacceptable for the purpose. I did it in response to feedback on&#xA;&lt;a href=&#34;http://code.google.com/p/compcache/wiki/xvMalloc&#34;&gt;xvmalloc&lt;/a&gt; allocator&#xA;&amp;ndash; as part of compcache patches posted of inclusion in mainline Linux&#xA;kernel:&#xA;&lt;a href=&#34;http://lkml.org/lkml/2009/3/17/116&#34;&gt;http://lkml.org/lkml/2009/3/17/116&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Anti-tip of the month</title>
      <link>https://nitingupta.dev/post/anti-tip-of-the-month/</link>
      <pubDate>Wed, 18 Mar 2009 17:48:00 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/anti-tip-of-the-month/</guid>
      <description>&lt;p&gt;Very old but still as relevant&amp;hellip; and very interesting too! Directly go&#xA;to &amp;ldquo;anti-tip&amp;rdquo; section of &lt;a href=&#34;http://www.mactech.com/articles/mactech/Vol.11/11.10/Oct95Tips/index.html&#34;&gt;this article&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;&lt;em&gt;&amp;ldquo;The moral of the story is: don’t get tricky. C programmers often try to minimize the number of lines of C in their program without consideration for what the compiler will generate. When in doubt, write clear code and give the optimizer a chance to maximize performance. Look at the compiler output. Your code will be easier to debug and probably faster too.&amp;rdquo;&lt;/em&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Fedora 10 instability issue solved!</title>
      <link>https://nitingupta.dev/post/fedora-10-instability-issue-solved/</link>
      <pubDate>Wed, 18 Mar 2009 17:35:00 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/fedora-10-instability-issue-solved/</guid>
      <description>&lt;p&gt;One of my Fedora 10 systems used to freeze very frequently. After lot of&#xA;looking around I found its because of &amp;ldquo;KWin Composing&amp;rdquo; which gives&#xA;OpenGL driven special effects for desktop. Unfortunately, Linux has&#xA;always been bad at radeon drivers, so it better to disable these effects&#xA;especially if you have radeon video cards.&lt;/p&gt;&#xA;&lt;p&gt;in ~/.kde/share/config/kwinrc:&lt;/p&gt;&#xA;&lt;p&gt;in [Compositing] section change&#xA;Enabled=true to Enabled=false.&lt;/p&gt;&#xA;&lt;p&gt;Reboot after this change. Now I never get any system freeze - as is&#xA;expected from solid Linux system :)&lt;/p&gt;</description>
    </item>
    <item>
      <title>Difference Engine - Harnessing Memory Redundancy in Virtual Machines</title>
      <link>https://nitingupta.dev/post/difference-engine-harnessing-memory-redundancy-in-virtual-machines/</link>
      <pubDate>Fri, 13 Mar 2009 05:30:00 -0700</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/post/difference-engine-harnessing-memory-redundancy-in-virtual-machines/</guid>
      <description>&lt;p&gt;Here is link to paper&#xA;(&lt;a href=&#34;http://www.usenix.org/events/osdi08/tech/full_papers/gupta/gupta.pdf&#34;&gt;pdf&lt;/a&gt;)&#xA;(&lt;a href=&#34;http://www.usenix.org/media/events/osdi08/tech/mp3s/gupta.mp3&#34;&gt;MP3&lt;/a&gt;)&lt;/p&gt;&#xA;&lt;p&gt;Recently I came across this paper published in &lt;a href=&#34;http://www.usenix.org/events/osdi08/&#34;&gt;OSDI&#xA;&amp;lsquo;08&lt;/a&gt;. Its an extension to VMware&amp;rsquo;s&#xA;page-sharing and shows some amazing and &lt;span&#xA;style=&#34;font-weight: bold;&#34;&gt;hard to believe&lt;/span&gt; results. VMware&#xA;page-sharing mechanism scans memory for all VMs and maps pages with&#xA;&lt;span style=&#34;font-weight: bold;&#34;&gt;same&lt;/span&gt; contents to a single page.&#xA;This achieves memory savings if multiple VMs are hosted running same OS.&#xA;However, with technique discussed in this paper, we find pages that are&#xA;&lt;span style=&#34;font-weight: bold;&#34;&gt;nearly same&lt;/span&gt;. For such pages,&#xA;they save a &lt;span style=&#34;font-style: italic;&#34;&gt;base page&lt;span&#xA;style=&#34;font-style: italic;&#34;&gt; &lt;/span&gt;&lt;/span&gt;and other similar pages as&#xA;&lt;span style=&#34;font-style: italic;&#34;&gt;delta&lt;/span&gt; of original page. For&#xA;pages which are not similar to any other page are simply compressed.&#xA;Their benchmarks shows upto 45% more memory saving over ESX page-sharing&#xA;under some (specially crafted) workload.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Nitin Gupta</title>
      <link>https://nitingupta.dev/page/about/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>ngupta@nitingupta.dev (Nitin Gupta)</author>
      <guid>https://nitingupta.dev/page/about/</guid>
      <description>&lt;p&gt;I am deeply passionate about optimizing GPU performance and delving into the intricacies of resolving bottlenecks within render and compute workloads.&lt;/p&gt;&#xA;&lt;p&gt;With nearly 15 years of experience delving into the nitty-gritty details of technology, I&amp;rsquo;ve dedicated a significant portion of my career to working on various low-level components, including Linux Kernel Proactive Compaction (&lt;a href=&#34;https://lwn.net/Articles/817905/&#34;&gt;LWN.net article&lt;/a&gt;, &lt;a href=&#34;https://www.phoronix.com/news/Proactive-Mem-Compact-Non-RFC&#34;&gt;Phoronix article&lt;/a&gt;), &lt;a href=&#34;https://docs.kernel.org/admin-guide/blockdev/zram.html&#34;&gt;zram&lt;/a&gt;, &lt;a href=&#34;https://docs.kernel.org/mm/zsmalloc.html&#34;&gt;zsmalloc&lt;/a&gt;, etc. to the Linux kernel, with a particular emphasis on scalability and performance enhancements.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
