help testing a script to determine stage arches.

loonylion · May 16, 2023

Hi funtoo users,

Having lurked around the bug tracker for a while, I noticed the bug filed by user @kery about their westmere that isn't a westmere, which was subsequently mentioned by drobbins in the February newsletter. Having read an article elsewhere in the past about gcc sometimes doing funny things with -march and -mtune, it occurred to me to try to create a script that compares cpu flags with what gcc expects for each -march value used by funtoo stage3 archives, and recommends the most suitable one (i.e has the most matching instruction sets)

I have done this, creating a commandline python 3 script (be nice, my first major python project)

There was also going to be a webpage version, but it spontaneously, inexplicably and apparently unfixably broke for no apparent reason. (TECH DETAILS: PHP8 suddenly decided that the $_POST array key isn't defined, and that's now a fatal problem apparently. It is defined and there is data in it, which PHP is apparently ignoring. It had worked for a week beforehand.)

The Python CLI script (which directly reads /proc/cpuinfo and also queries the local gcc) is available on the bug tracker, bug FL-11330: https://bugs.funtoo.org/browse/FL-11330. I am willing to make it more public when I figure out the best/most appropriate way of doing so and after it's had a bit more testing/validation. Script should work on python >= 3.7, but has been tested on 3.9.

I'd like experienced Funtoo users to have a go at running it and check that the output is correct and sane for your CPU. If there's something not right could you please provide the following information from your test system:

Python version
Gcc version
cpu identification strings from /proc/cpuinfo (so I can look it up)
cpuflags from /proc/cpuinfo
script output
output of 'gcc -march=[whatever your cpu arch is according to gcc] -Q --help=target', trimmed to only include the top section (down to 'known assembler dialects')
what you believe the output should be
if funtoo is installed, what subarch is installed.
any other possibly useful information (like if you know your cpu is a weird one like the previously mentioned 'not a westmere'.)

Thanks in advance.

grouche · June 9, 2023

Hi @loonylion thanks for your efforts on this! Selecting the best subarch for a system can really be a challenge...

I've been using Skylake subarch on my laptop with Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz without issue, but your script says I should use Broadwell instead due to missing cpuflag SGX

I found this informative article on it: https://phoenixnap.com/kb/intel-sgx

Great work on this script though! It gives a ton of output while still being readable 😄

EDIT: I took ANOTHER look in the BIOS options and found it, Intel SGX when enabled in BIOS changed the output of your script to recommend Skylake subarch

skylake - Estimated 94.87% match: 37 / 37 flags intersect: fxsr xsave sse4 vzeroupper mwait clflushopt adx avx f16c sse4.2 xsaveopt xsaves avx2 80387 mmx sse2 movbe sahf cx16 sse3 sgx bmi2 fsgsbase ssse3 popcnt rdrnd rdseed sse4.1 pclmul xsavec crc32 lzcnt aes prfchw bmi sse fma
5 Implied: lzcnt sse4 crc32 prefetch vzeroupper - Implied by: ABM SSE4_1 SSE4_2 3DNOWPREFETCH AVX
2 Extras: abm prefetch

Edited June 9, 2023 by grouche

loonylion · June 9, 2023

This is interesting, especially the SGX bit considering its bios disableable, and also removed from >= 12th gen. Perhaps I need to consider ignoring SGX.

Currently working on a new version of the script that takes cache sizes into account.

EDIT 14/Jun/2023

Script has been significantly updated and now needs testing again :P

Changes:

Now ignores SGX instruction set, due to being bios settable (and disabled by default on supported platforms apparently). Thanks @grouche for spotting that.
After discussions with @drobbins and further digging around in gcc the script can now check L1/L2 cache sizes.
After adding the above functionality, the methodology used to rank uarches no longer made sense (ranked by instruction set intersections, percentage was calculated later (intersections over cflags available on cpu) for display purposes only.) This has been changed.
Ranking is now done by percentage using the following methodology: percentage calculated as above (intersections over cflags available on cpu). Cache sizes are queried and mismatches (i·e gcc expecting a larger cache than the cpu has) are penalised by halving the percentage for L1 mismatches, and/or deducting a quarter of the percentage value for L2 mismatches. Finally, a further 5% is deducted for every non-ignored instruction set that is present on the cpu but is disabled in the gcc profile (using -mno-)
uarches with cache mismatches are reported separately after the initial ranking (which they are not included in)
Tried to condense output to 1 line per uarch (excluding recommended uarch) This has entailed replacing tabs with a variable number of spaces, so may have slightly reduced readability.
Script also identifies uarches where the current cpu actually has MORE cache than the profile expects, however this does NOT impact ranking or percentage calculation at all, it is purely informational.
Added separator character lines to try to make the output more digestable

Edited June 14, 2023 by loonylion
update, formatting

Sign In

help testing a script to determine stage arches.

Recommended Posts

loonylion

Link to comment

Share on other sites

grouche

Link to comment

Share on other sites

loonylion

Link to comment

Share on other sites

Browse

Activity