I have been happily using the Accelerate framework to speed up some financial operations involving thousands of data points. Sadly, there is no `vDSP.sum(_:)`

for signed integer numbers (only floating-point numbers).

The naive `reduce()`

implementation is *too slow* for my current purposes. Therefore, I decided to use this opportunity to learn how to use AVX intrinsics with Swift. So far I got this:

```
import _Builtin_intrinsics.intel
extension vDSP {
/// Returns the single-precision vector sum.
@_transparent static func sum<U>(_ vector: U) -> Int32 where U:AccelerateBuffer, U.Element==Int32 {
vector.withUnsafeBufferPointer { (buffer) -> Int32 in
let (iterations, remaining) = (buffer.count / 8, buffer.count % 8)
var result: Int32 = buffer.baseAddress!.withMemoryRebound(to: __m256i.self, capacity: iterations) {
var accumulator = _mm256_setzero_si256()
for i in stride(from: 0, to: iterations, by: 1) {
let element = _mm256_loadu_si256($0 + i)
accumulator = _mm256_add_epi32(accumulator, element)
}
let values = unsafeBitCast(accumulator, to: SIMD8<Int32>.self)
return values[0] &+ values[1] &+ values[2] &+ values[3] &+ values[4] &+ values[5] &+ values[6] &+ values[7]
}
for i in stride(from: 0, to: remaining, by: 1) {
result += buffer[iterations * 8 + i]
}
return result
}
}
}
```

The current code has several shortcomings and somehow I am unable to use some AVX2 intrinsics, such as `_mm256_extracti128_si256`

. I would like to use some horizontal adds and extract parts of the values.

You can see the current compiler outcome in Godbolt.

Concretely, I have several questions:

- Is someone out there actively using vector extensions with Swift?
- How can I activate AVX2 compilations per function? The
`-Xcc -Xclang -Xcc -target-feature -Xcc -Xclang -Xcc +avx2`

flags seems*too heaviy handed*(likewise for setting the whole project "Enable Additional Vector Extensions" attribute) - What is the best way to implement an
`Int32`

sum with AVX2?