Jump to content

Welcome to the new Traders Laboratory! Please bear with us as we finish the migration over the next few days. If you find any issues, want to leave feedback, get in touch with us, or offer suggestions please post to the Support forum here.

  • Welcome Guests

    Welcome. You are currently viewing the forum as a guest which does not give you access to all the great features at Traders Laboratory such as interacting with members, access to all forums, downloading attachments, and eligibility to win free giveaways. Registration is fast, simple and absolutely free. Create a FREE Traders Laboratory account here.

taotree

In Pursuit of Accurate Data

Recommended Posts

So... there have been various discussions about data feeds and I'd like to see if I can find a way to ensure I have a quality one. I've heard some recommendations but...

 

Here's why I ask. Supposedly, I would expect that CQG data factory would be of high quality. This is historical data, so there are no questions about connection issues.

 

I downloaded their sample and found this example:

 

F.US.EPM08 20080104 1 0918 144375 B N N 28

F.US.EPM08 20080104 1 0918 144375 B N N 25

F.US.EPM08 20080104 1 0918 144400 A N N 34

F.US.EPM08 20080104 1 0918 144375 T N N 9

F.US.EPM08 20080104 1 0918 144375 T N N 5

F.US.EPM08 20080104 1 0918 144375 T N N 6

F.US.EPM08 20080104 1 0918 144325 B N N 8

F.US.EPM08 20080104 1 0918 144375 A N N 19

F.US.EPM08 20080104 1 0918 144325 B N N 5

 

Bid was at 25, then there were trades of 20 total and then bid went to 8.

 

I asked them if that was normal and they said "what you see is normal and happens quite often". But we have 3 trade events in a row without any bid updates, and when we do get a bid update, it doesn't add up.

 

I guess I might be able to get data from CME directly.

http://www.cmegroup.com/market-data/datamine-historical-data/

 

It would be very interesting to see if they have the same issue of numbers not all adding up.

 

What I'm getting at is... it seems incredibly difficult to find and confirm that data is accurate and I'm starting to wonder if it's possible at all.

Share this post


Link to post
Share on other sites

thats is why ultimately any back test is not necessarily going to give a 100% view of what may happen in reality.

Unfortunately collecting accurate data is costly. I used a group called CSI data for daily EOD data. They came highly recommended etc; etc; even in their data there are discrepancies and issues. We went through and investigated a few of them, did some research as to why, and then adjusted a few things.... is this hindering or helping - I dont know.

But ultimately a back test over a longer period of time a few tick differences should not make a difference. With more high frequency trading I would assume every tick can make the difference.

I guess once you get down to collecting your own data and daily reviewing it is ultimately a final resort..... I think you will unfortunately find this everywhere. It all depends on what you require it for. I have taken the view I just want data that is close enough.... and then I will take any test with a grain of salt.

Share this post


Link to post
Share on other sites
I guess once you get down to collecting your own data and daily reviewing it is ultimately a final resort..... I think you will unfortunately find this everywhere. It all depends on what you require it for. I have taken the view I just want data that is close enough.... and then I will take any test with a grain of salt.

 

I'm a little confused--I would expect historical data to be more accurate than real time data. I was collecting my own data for a while, but I now know that the feed I was using is inaccurate so... collecting data doesn't do me any good unless I know the data I'm getting is good.

 

And as for "close enough"... that's what I'm struggling with. If this type of discrepancy "happens quite often" how do I assess the quality of data? If there are confusing things happening in even the best quality of data available (which I do not know if that's the case), how does one differentiate between the best, the "close enough" and the way off?

Share this post


Link to post
Share on other sites

By the way... It appears that you can get a direct internet feed to CME for $500/month but... does anyone know of a data distributor that just directly passes on CME data for less than $500/month? Then it would be easy to verify that we're getting the best data (can purchase some historical CME data directly and compare it to confirm). Comparing to data feeds that change the format and such is difficult because it requires manipulation and even interpretation.

Share this post


Link to post
Share on other sites

historical data is drawn live data so now you understand why there are issues collecting data then - and why accurate data is expensive.... and all the things that can go wrong with it.....there should be no confusion.

 

There are plenty of discussions already about issues with tick data. with daily data, there should be less issues but there still are....

I go close enough as i dont have the time, resources or inclination to even wonder down the path of data collection.

Share this post


Link to post
Share on other sites

 

What I'm getting at is... it seems incredibly difficult to find and confirm that data is accurate and I'm starting to wonder if it's possible at all.

 

Me too. My hunch is that exchanges aggregate data at times. CME certainly reserve the right to, though I must be honest, the last time I looked for the document that mentioned it I could not find it.

Share this post


Link to post
Share on other sites

Oh, this is fun. So...

 

First of all, let me say that when I looked at the other data it took me less than a couple minutes to find data that didn't line up. So either it does happen all over the place, or I just got really lucky. Sorry... not rigorous check yet since I'm still just exploring.

 

So, I took a look at sample data direct from the CME. I spent a good deal longer finding ones that matched up--maybe 10 minutes? not sure, but then I found this:

 

From: Top of Book

 

2010011509564006628780EES F100300900 201132252A M M100115

2010011509564006628780EES F100300157 201132002B M M100115

2010011509564006628790EES F100300038 201132002 100115

2010011509564006628800EES F100300900 201132252A M M100115

2010011509564006628800EES F100300120 201132002B M M100115

 

What this is showing is:

First: ask = 900, bid = 157

Then a trade of 38

Then: ask = 900, bid = 120

 

Oops, we gained an extra one. So... there is at least one example of the CME's own data not lining up. So apparently the CME doesn't require them to match up.

 

By the way... it's not that a 1 difference in the bid is going to make or break a strategy. It's that these discrepancies are making it difficult for me to assess the accuracy of a data feed.

 

Another one:

 

2010011509564006628980EES F100300870 201132252A M M100115

2010011509564006628980EES F100300098 201132002B M M100115

2010011509564006628990EES F100300001 201132002 100115

2010011509564006629000EES F100300870 201132252A M M100115

2010011509564006629000EES F100300096 201132002B M M100115

 

And another:

 

2010011509564006629250EES F100300083 201132002A M M100115

2010011509564006629250EES F100301185 201131752B M M100115

2010011509564006629260EES F100300001 201132002 100115

2010011509564006629270EES F100300076 201132002A M M100115

2010011509564006629270EES F100301183 201131752B M M100115

 

Found those two quickly after the first... Okay... now time to ask CME directly about this and see if I can dig into what's going on.

 

At this point, I'm going to assume then that probably the only possible way of assessing the accuracy of a data feed is to collect the data, purchase CME data for the same day, and run a check to ensure every tick was received in the same order and context.

Share this post


Link to post
Share on other sites

I posted about this exact issue a month or two ago on here. I spoke with IQ feed a week or two ago and they claimed two important things:

 

1. Timestamps come from the exchange -- the exchange provides the time stamps so that prevents market events from being *logged* out of order (but it's still possible for you to *receive* events out of chronological order...not a major problem, easy to sort out.

 

2. They claim that they never filter data (at least for CME, that's all I asked about). They report the complete datafeed from CME.

 

I might be missing something, but that sounds as good as having a direct line to CME. That's probably why Fulcrum recommends IQ Feed, but then again, he's using a $2k/month feed so he's obviously getting something else too (and I'd love to know what's worth the extra $1800 / month :-)).

 

Can someone please send me just 1 day of ES data (anytime after February) from your feed. I'd be especially interested in a sample from IQ Feed as that's the vendor I am considering, but also interested in comparing against other providers and I promise to document my findings here for the benefit of everyone.

Share this post


Link to post
Share on other sites

I dont collect intraday data so cant help there, but I initially collected a lot of EOD data from CSI Unfair Advantage - they have a good range of products that can be bought so you can get EOD data back from 20 years ago. they were recommended by others.

The data is generally very good, a few hiccups, but as mentioned above that might because a few days get out of sync. The real advantage they offered was a way to back adjust continuous contracts and a fair bit of flexibility to manipulate the data into the manner we required. They also offer stocks - but I did not look at them.

But as stated for me something that is close enough was all I required....so I wont be much help sorry....but would be very interested in any final conclusions of the thread.

Share this post


Link to post
Share on other sites

 

2. They claim that they never filter data (at least for CME, that's all I asked about). They report the complete datafeed from CME.

 

g

 

Filtering and aggregating/coalescing are not quite the same thing. They always used to reserve the right to but in the absence of current evidence I guess we can discount that. :)

 

When they where talking about 'data' was that the full order book?

Share this post


Link to post
Share on other sites
I don't know of any feeds which supply the entire order book. They said that they don't filter their market depth data.

 

I assume that's all he meant by order book--market depth to the same level as the exchange gives. For eMinis that's 10 levels, for other products typically 5 or whatever.

Share this post


Link to post
Share on other sites
I don't know of any feeds which supply the entire order book. They said that they don't filter their market depth data.

 

Yeah that's me being sloppy what I meant was there limit order book to however many levels are reported (or market depth data).

Share this post


Link to post
Share on other sites

I am also in Pursuit of Accurate Data for the purpose of order book analysis and I am seriously considering to go with CQG. taotree, do you have any updates?

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.