I have been playing around with Data science for a while. I recently focused more on feature selection to improve the predictability of ML models. I wanted to reach out to the community and see what their experience has been.
I used Scikit-Learn permutation_importance() method to determine the importance of the features in a toy Heart Failure dataset. Found these results:
I found that variables 'time' and 'ejection fraction' have the most predictability in determining my outcome variable.
The next step was to see if one could use this process to predict automatic strategy performance.I chose the slope of SMA50 and a categorical variable, which is the output of an ML algorithm that classifies the market in 1 of 11 possible distinct regimes. I thought these two variables would have some predictability but had the following results:
As the picture shows, these variables cannot predict better than two randomly generated categorical and numerical controls. I repeated the experiment several times with the same results.
The reason for starting this thread is to see if someone with experience in data science can provide some practical insights based on their experiences. I have a few ideas of variables I can use, such as sentiment, previous performance, volatility, and momentum variables, but wanted to see what others have thought of before I go into this rabbit hole.