Welcome to NexusFi: the best trading community on the planet, with over 150,000 members Sign Up Now for Free
Genuine reviews from real traders, not fake reviews from stealth vendors
Quality education from leading professional traders
We are a friendly, helpful, and positive community
We do not tolerate rude behavior, trolling, or vendors advertising in posts
We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community. It's free for basic access, or support us by becoming an Elite Member -- see if you qualify for a discount below.
-- Big Mike, Site Administrator
(If you already have an account, login at the top of the page)
Its such a bear working with large data sets... this one is pretty large.
2,616,997 rows so far, and not quite 2/3rds the way..
there is Sooooo much one can do with this much data once its prepped..
heck, this is just putting the sets together the way i want it...
how about using the raw data to classify candles against their next iterations results?
people claim lots of things about candles, and patterns.. be interesting to see if they are right
or whether it was wishful thinking on their parts..
anyway... this is about halfway through..
and there are at least four more sets to go..
going to take into the weekend probably....
thats the issue with deep learning or expert systems or ANN machine learning
whatever you want to call it, the biggest time is spent in data prepp, and learning
both let you get up and watch TV and eat and take naps...
Can you help answer these questions from other members on NexusFi?
Today a good day...
yesterday i lost about $50...
today i made $676.50
cant complain..
but i can biatch that one of the stock i bought if i waited, would have earned my day trade in options a lot more
it went up 12 points... dont you hate that?
tomorrow my credit spreads will expire profitable... so cant complain on that...
AND if things work out on the ACHN shares..
they will deposite 70k into my account...
a 47 times profit!!!!
thats probably the first and last time that will happen in my life...
but here is to hoping.. (and i wont believe it till i get the actual cash in my account!!!)
either way, the stocks and options play should net another two payments of 20k each if the CVR conditions are met
i do believe one of them will be met... but unsure about the 2nd... but thats all icing i guess...
The stock data has been rolling all night long...
and it has three more passes to go through after this one finishes...
working with large datasets are a PAIN...
but i will say that they are the only way to go if your going to program Neural Nets with Pytorch..
Well did good yesterday..
up 1160.50
today i am up..
but will wait till i have a final total given how things in life can be so changeable...
On another note...
I have to rebuild the database... (i found a new source of data)
The new one will be even bigger, and go back farther in time!!!
Thats the good news..
The bad news is how hard it is to work with that much data
I mean its REALLY hard... even though i have made myself a monster box
its not a Cray...
will update with stats later as to how many tickers and how many years...
if 37 million days was a lot before... this will be even larger...
time to put the old database to rest...
[isnt it nice that my updates are all written to myself... ]
I just finished acquiring the data..
HUGE pool... but uneven quality given its 'adjusted prices'
what do they mean by that?
Well, from what i can tell, is they adjust the old prices downwards on splits..
either that or stock that was sold in 1970 had a zero price...
this is going to be an issue of cleaning things up AFTER i get things put into the MSSQL database
Created a new DB.. now will write the software to put the data in..
then i will compare it to the old database as many companies i have in that one, are now gone
and gone companies tend not to be kept in new databases...
but their trading data (for CNN neural nets) is still good...
then i will have to see if its possible to merge some of this data..
how much do i have?
well... 19,000 companies now... almost double what i had before..
and prices in some cases go back to 1960's
so i am unsure whether i will bother loading ALL the data..
after all, how relevant is it to see what prices and price action there was before online discount trading?
and before the normal use of HFT machines
i will PROBABLY load it all..
then i will back it up
then i will delete everything prior to a certain date..
just have to figure out what that date should be..
i would ask for ideas, but i dont think anyone is really reading my journal...
here is the histogram of 11,000 companies since jan 2007 w prices between $10 and $500
[my other database has 19,000 companies going back to the 1960s]
as you can see...
normalizing this data would not work that well...
Doing this kind of data analysis really necessary or else what efforts are put in, are probably going to fail.
This has been a pain.. but its a necessary pain...
when python failed to be able to handle the data..
MSSQL to the rescue! actually... if anyone here is doing neural nets
i would advise loading data into SQL server if you have large datasets
SQL server can handle huge amounts in record times without dragging your system
in this case, i took the original tables, sorted and got the High values and low values for each column
then built query that would report all the old fields, and then add the normalization across the whole data set
and so i would not have to do it again, i did log(x), and zscore, as well as linear range
and clipped the data to stock prices between 10 and 71 because this is the lions share of stocks over history
yes there are stocks above 71, of course... but the number of records of prices they have are dwarfed by others
between 10 and 71 represents 11,230,434 rows going back to Jan 2017 to sometime in 2019
while the number of records for stock prices above 71 is 1,481,529 - an obvious big difference..
for those curious, this is what the data looks like...
Well, what do you think is over your head? while doing the actual work in detail maybe, understanding things conceptually should not be. for instance... you probably cant build a nuclear power plant, but you can understand how the most important parts work in the abstract, even down to the nuclear reactions! there have even been great videos which started using walt disneys example of filling up a gymnasium of mousetraps and ping pong balls to illustrate how neutrons can cause an unstable atom of uranium, to fall apart, and emit energy and more neutrons that create a chain reaction.
So, if you want to understand, you can just ask...
I only bite when people are insulting...
and honest curiosity is never insulting