Combining discretionary trading, risk management and ML as an art

October 26th, 2017, 12:29 PM

At this point in the thread I think I need to add a real world example.

The course has been good so far - but now has it's own 'concept drift' with very brief videos by the new lecturers.

In the lesson where they did time series analysis (apple stock), I sensed how the instructors knew machine learning and stats but did not connect to real world and specifically trading. And that of course is the key to this thread - moving from the abstract to the real world.

(As a small digression I remember thinking how educators could do a ral service by interviewing people who are working in their own fields and ask them what are both the good and bad of their professions - what the reality is.)

So I have created an ARFF file (the Weka format) with CL daily OHLC & vol. Weka has readers to pull in CSV formats and Excel formats (among others such as db). However, the CSV convertor gave errors and the Excel showed my date as numeric (so the format one sees for a date 25-Oct-2107 or 10/25/2017 etc is not how the spreadsheet stores it 43034=numeric not date).

So it was quite a bother to find a work-a-around to create a file with the date as a date and not numeric.

November 1st, 2017, 07:17 PM

As I was starting to learn about ML I listened to a series of fast paced youtubes.

I thought I would include then if they are of interest to others.

intro to deep learning

Intro in Deep learning 2

Sentiment Analysis 3

waldeinsamkeit - alone in the woods connecting to nature

Math used in neural nets 4

Predicting Stock Prices - Learn Python for Data Science Siraj Raval

How to Make an Image Classifier - Intro to Deep Learning #6

This isn't on ML but interesting
HOw to learn

[yt]https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A/videos[/yt]

November 1st, 2017, 07:39 PM

I have added info about ML into the summary page

logit

November 1st, 2017, 08:11 PM

In the 3rd Weka course they show how to use the forecasting package and give an example of Apple stock daily values and forecasting tomorrow's closing price with the previous closes. The absolute mean error is looked at as a measure of success of the forecasting.

For daytraders having an estimate of today's closing price is not the first thing on our list of things we would like estimates of. Now for the apple stock the error is considered small by the instructor (not knowing about trading and in reference to other kinds of ML.

I got to thinking what is the baseline error?
If you give a trader a O/H/L/C daily series the very first thing to see is what is the change between one days close and another.

1. a simple prediction without ML is using yesterday's change as the estimate of today's.

others might be:
2. yesterdays close as an estimate of today's (we know this won't be correct except on most days but in a flat trend the mean absolute error could be low).
3. just using today's open as an estimate of the close
4. today's open + yesterday's change from open to close.

November 3rd, 2017, 02:42 PM

If you would like to try time series analysis - forcasting, you could download Weka, and then the forecasting add-in package. With those you can watch the youtube lesson :

Advanced Data Mining with Weka (1.3: timeseriesForecasting package)

I suggest pausing and duplicating what the instructor is doing with apple stock prices.

I have created an ARFF file for CL daily prices (see attached zip) for you to play/learn with.

Weka has data-loaders for
CSV, DB, and XLS (Excel) files. (I haven't tried the db loader). Though the xls works fine for most files, for ones with a date column it see the date as numeric not date and so I went through some machinations with notepad and then manually created the ARFF header.

Here is the Weka Explorer showing how you open the CLdaily.arff file and the Forecasting tab (which shows once you add -in the package.)

November 3rd, 2017, 03:12 PM

I have run the CLdaily file 3 times once with each of the learning schemes.
The goal is to get the lowest Mean Abs Error.

All 3 learning schemes are about the same on test data (the last 30 percent of the data held-out for testing).
$0.79, $0.84, $0.81 (linreg, SMO, NN respectively).

on the (learning) data (first 70% of instanced) the results are
$0.98, $0.97, $1.49 (linreg, SMO, NN respectively).

(The entire 3 run results are attached in a zip)

November 3rd, 2017, 03:42 PM

I would think an obvious benchmark (equivalent to ZeroR for classifiers) for closing data is yClose.

So what happens if we just use yesterday's close as an estimate for today's close?

the ML systems didn't outperform
$0.79, $0.84, $0.81 (linreg, SMO, NN respectively)
to our benchmark $0.76

November 3rd, 2017, 04:01 PM

This is to my point about using ML intelligently.

Though ML people who are good with their systems might think $0.79, $0.84, $0.81 mean absolute errors (linreg, SMO, NN respectively) are good results, we as traders with common sense and real world experience know better with regards to trading.

Predicting the close isn't much good to a trader. Even if it were and I told you my estimate of the close was on average $0.79 from the actual you wouldn't be impressed.

$50.35 (the prediction for 21st of Sept 2017 from linreg -see data of CLrun1.txt) plus and minus $0.79 is a range from $51.14 to $49.56

I think the close on the 29th will be $50.35 but with an average error of 79 cents it could be as high as $51.14 and as low as $49.56. (and that is only the average, mean, error. It could be much wider)

---------
Actual data
2017-09-20 close = 50.69 chg 0.79
2017-09-21 close = 50.55 chg -0.14
----------

Pretty useless for trading info.
Don't apply ML blindly.

November 13th, 2017, 07:41 PM

Apriori builds rule sets from the data.

The data must be nominal (not numeric). You can discretize your data. This is a binning process that employs enthropy heuristic (chooses split points that have the largest information gain)

Course 2.2 More Data Mining with Weka (2.2: Supervised discretization and the FilteredClassifier)

3.4: Learning association rules

November 14th, 2017, 02:21 PM

11:47 AM 11/14/2017

In working with my dataset I found that if I used the discretization filter on the class attribute, the filter would discretize all other non-nominal attributes. I didn't want this so I am back to my spreadsheet holding my database and going through the discretizing the class attribute. The first step is to decided on the binning. Being familiar with the data of the class attribute I realized I wanted unequal binning size.

The goal is a model with the most predictive value to me for trading. Therefore, without knowing the results of the model run I can still make some decisions on binning f the class attribute.

If there are too many bins then I will not have enough instances in a bin. Too few and I will not have any predictive value.

I have 594 instances in my training dataset. I have decided on 12 bins (not all the same size).
Here is the frequency distribution of the data at equal bins each unit

The vertical blue lines are were I decided to set the bin edges

Here is the frequency distribution of the binning

Combining discretionary trading, risk management and ML as an art

Discussion in Emini and Emicro Index

Combining discretionary trading, risk management and ML as an art