Welcome to NexusFi: the best trading community on the planet, with over 150,000 members Sign Up Now for Free
Genuine reviews from real traders, not fake reviews from stealth vendors
Quality education from leading professional traders
We are a friendly, helpful, and positive community
We do not tolerate rude behavior, trolling, or vendors advertising in posts
We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community. It's free for basic access, or support us by becoming an Elite Member -- see if you qualify for a discount below.
-- Big Mike, Site Administrator
(If you already have an account, login at the top of the page)
I think in this thread I am trying to put forward that our knowledge of trading, having been in the trenches helps us when we move forward in applying possible tools such as ML.
There are a lot of dynamics to trading - risk management, stop-running algos, surprise moves, your own broker trading against you with "may act as principle" and are allowed to "internalize" a trade - meaning catching the edge from 2 clients and never sending it to the exchange, money management, psychology, lack of a clear road map, central bank market manipulation, volatility suppression and asset class correlation convergence (via risk parity funds, and interest rate suppression, buybacks and index hedging), low capitalization, etc. etc. The list is large.
So it may be that trading experience and knowledge will help us better understand the limitations of ML, the potentials of ML and the correct application.
..........
peace, love and joy to you
.........
Can you help answer these questions from other members on NexusFi?
There are 3 courses on Weka. I am now doing the 3rd one. In the most recent lesson we have gotten to time series analysis.
In lesson1.2 he does stuff one step at a time - manually so you can see the process and in the next lesson he uses the forecast plug-in model.
First lesson manually
First he does a regression and then a manually lagged one. The db is the airline data and it has a 12 month cyclically to it, and so he creates a 12 month lagged series (attribute). Then deletes the first 12 months (instances) as linear regressions does bad things with missing data.
Here's the data straight-line regression:
Here's the result of the 12 month lagged and then regressed
With the lagging you get in effect and cyclical model and quite a good fit.
source:
Advanced Data Mining with Weka (1.2: Linear regression with lags)
In ML you often to the training of your model with 10-fold cross validation. (reusing 9 of 10 data slices in different combinations.) However you can't do this with time series data because the order matters.
So with time series data you hold out part of your data set for testing (training on the rest).
As you know in trading your data can go through "phases". So you can have:
an uptrend phase followed by
a sideways phase followed by
another upward phase followed by
a sideways phase followed by
as an example.
If you were building a model for the Close to Close (C:C), then ignoring the phases will not give you a model that simply holds back the last 15% of your data and training on the rest, will ignore the difference in phases (Note 1).
All traders understand these phases (market context) and so understand just dumping all the data in and turning the crank may not give you the results you are looking for. This is the problem of "averaging" across phases.
------------ Note1: At least to my knowledge at this time
The automatic time series forecasting package at default settings can give a very low error (when trained on entire data set - which you shouldn't do) and over-fitting. Overfitting is a bad thing where your model is so tight to the data that it will probably not forecast well.
The automatic program creates this by generating a large number of attributes, giving a very complex model and misleadingly high fit on the entire dataset. If we hold out 24 attributes for testing we can see the difference showing overfitting with too many attributes (attributes would be like columns in a spreadsheet).
source:
Advanced Data Mining with Weka (1.3: timeseriesForecasting package)
NaiveBayes often works very well even when the independence assumption clearly doesn't apply.
In time series analysis you have to put your thinking hat on and be careful to only have in meaningful variables (attributes). Otherwise you can easily overfit and have a model that doesn't forecast well.
ZeroR counts occurrences and predicts all most on the most common.
So if you had 1000 days and 600 up days and 400 down days it would predict (classify) all days as up days - as it is the most numerous and it would have 60% correct.
Let's say you are looking for a ML for predicting up days on ES and your model starts on March 2009.
If you just run an ML and get 65% up days you might shout "Eureka!" until you run ZeroR and realize that 65% were up days and your model didn't give you new insights.
There is a time forecasting plug-in package for Weka.
It comes with a db of apple stock prices in 2011. date, O/h/l/c and adjclose.
If you open in an just run the forecast the package will interpolate values for the date holes, ie weekends and holidays. This will throw off your results so you tell the prg to ignore weekend, date1, date2, ... (dates of the holidays when the market is closed.) The instructor predicted the closing price.
For day traders we are interested in intraday values. All the values are assigned to the same date
that is o,h,l,C are all on the same row - even those these happen at different times.
I was thinking it would be nice to have a third dimension, time (of the day) but I don't know how to set this up.
source:
Advanced Data Mining with Weka (1.3: timeseriesForecasting package)
Advanced Data Mining with Weka (2.1: Incremental classifiers in Weka)
Advanced Data Mining with Weka (2.2: Weka’s MOA package)
MOA can be run on multiple computers or on just one. It can be run from Explorer, CLI, and with the java API.
massiveOnlineAnalysis: MOA (Massive On-line Analysis).
URL: https://moa.cms.waikato.ac.nz/
Massive On-line Analysis is an environment for massive data mining. MOA provides a framework for data stream mining and includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems. IMPORTANT: This package requires a separate download of the MOA jar file in order to work. After installing the massiveOnlineAnalysis Weka package: 1) download MOA from https://www.cs.waikato.ac.nz/~abifet/MOA, copy the "moa.jar" file from the MOA distribution to $HOME/.weka/packages/massiveOnlineAnalysis/lib, 3) re-start Weka.
Advanced Data Mining with Weka (2.3: The MOA interface)
In prequential evaluation with each incoming instance we test and then we train.
One can use it for infinite datastreams. It process each new instance and tests and then learns.
As one can connect it with the JAVA api - I would imagine it can listen to the incoming datastream -( though I don't have the fainest idea how to do this ) updating your ML model in real-time.
It handles concept drift - which for traders would be changes in market conditions and so your model is adaptive to changes in trend and volatility etc
It can also split the workload amongst several machines
-- all the above is just my understanding of it and the programmers out there could say better. ---------