Skip to content

SoftMax: Multiply by 1 / expSum instead of dividing by expSum#125269

Closed
BarionLP wants to merge 1 commit intodotnet:mainfrom
BarionLP:barion-softmax-multiply
Closed

SoftMax: Multiply by 1 / expSum instead of dividing by expSum#125269
BarionLP wants to merge 1 commit intodotnet:mainfrom
BarionLP:barion-softmax-multiply

Conversation

@BarionLP
Copy link
Contributor

@BarionLP BarionLP commented Mar 6, 2026

System.Numerics.Tensors.TensorPrimitives.SoftMax:
We can convert a bunch of divisions to multiplications by calculating 1 / expSum once and then multiplying.

I assume this will slightly change the output of SoftMax but since float operations aren't consistent across platforms this should not matter.

Benchmark results

I assume this largely depends on the hardware

| Method  | Count   | Mean         | Error       | StdDev      | Ratio | Allocated | Alloc Ratio |
|-------- |-------- |-------------:|------------:|------------:|------:|----------:|------------:|
| BuiltIn | 1000    |     242.2 ns |     0.65 ns |     0.61 ns |  1.00 |         - |          NA |
| Mine    | 1000    |     228.1 ns |     0.06 ns |     0.05 ns |  0.94 |         - |          NA |
|         |         |              |             |             |       |           |             |
| BuiltIn | 1000000 | 309,299.3 ns |   736.81 ns |   689.21 ns |  1.00 |         - |          NA |
| Mine    | 1000000 | 301,680.0 ns | 1,214.74 ns | 1,136.27 ns |  0.98 |         - |          NA |

code: https://gist.github.com/BarionLP/aff1bca0d507dfb16f52bb715e3a58a2

late follow up to #111615

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 6, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

@BarionLP
Copy link
Contributor Author

BarionLP commented Mar 7, 2026

i think the build fails are unrelated?

@BarionLP
Copy link
Contributor Author

BarionLP commented Mar 7, 2026

would it make sense to change the implementation of Divide too?

    public static void Divide(ReadOnlySpan<float> x, float y, Span<float> destination) =>
        InvokeSpanScalarIntoSpan<MultiplyOperator_Single>(x, 1 / y, destination);

is there a reason SoftMax calls InvokeSpanScalarIntoSpan manually instead of using Divide?

@BarionLP
Copy link
Contributor Author

BarionLP commented Mar 7, 2026

i just realized that if x and y are very large the difference between x / y and x * (1/y) can be quite big (x * (1/y) could even become zero) so maybe this is not a good idea after all

@tannergooding
Copy link
Member

I assume this will slightly change the output of SoftMax but since float operations aren't consistent across platforms this should not matter.

This is not true. IEEE 754 floating-point is deterministic, by design. This is particularly true for all primitive operations such as +, -, *, /, Sqrt, and a few operations that are listed as "required" by the spec.

The source of such "differences" are typically caused by users improperly opting into features like fast-math, from when the legacy x87 FPU was prominent and users did not correctly set the rounding mode before doing operations (and so were doing 80-bit, not 64-bit or 32-bit operations), or come from using a "recommended" operations (such as Sin) for which the underlying C runtime implementations explicitly opted for perf over accuracy.

If a user wishes for a different result, they can explicitly use the relevant operations themselves. Our own implementations are striving for accuracy first and foremost, particularly when that is trivial to achieve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Numerics.Tensors community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants