Jump to content

Welcome to the new Traders Laboratory! Please bear with us as we finish the migration over the next few days. If you find any issues, want to leave feedback, get in touch with us, or offer suggestions please post to the Support forum here.

  • Welcome Guests

    Welcome. You are currently viewing the forum as a guest which does not give you access to all the great features at Traders Laboratory such as interacting with members, access to all forums, downloading attachments, and eligibility to win free giveaways. Registration is fast, simple and absolutely free. Create a FREE Traders Laboratory account here.

NetTecture

Anyone Interested in a Complete CME Feed Sample?

Recommended Posts

Hello folks ;)

 

As part of my development process for a trading platform, I am testing my data collection system next week. This is a stress test and will take 1-2 weeks - I can not rule out I have to restart of get a crash first week, and I Want a complete week sample.

 

Anyhow, I will collect a complete data feed from the CME, electronic trading (basically: the same you get via zen-fire, just COMPLETE, not filtered like zen-fire is, and not limited by symbol).

 

This will include:

* Preopening, openeing, closing etc. information

* Trades, obviously

* Best bid, Best ask, Bids AND Asks, including their invalidation (setting the bid / ask volume to 0 when the price moves out and the exchange does not track them.

 

This will include the following instruments:

* Naturally all futures.

* All options, are published by the exchange. This is the majority of data.

* Virtual instruments, such as spreads trades that are done on the exchange level. Most people won't know that but you can execute those on the exchange, which guarantees the spread value then. Those CAN be ignored - spread executions also show up as the separate leg trades in the instruments.

 

The timestamps are in microsecond format from the original data capture equipment. The stream is totally unedited. It does NOT include cancels an corrections to my information. I record it to get an idea of the volume, analyse that my data capture approach and get an idea how to actually process that hugh amount of data - as a stress test for my equipment.

 

Anyone interested in a copy (ONLY for the complete data) can contact me via PM. This is initially a free offer, but if the volume goes too high up I may have to ask for a small contribution for the data transfer amounts. Can not have 20 people downloading 50gb or so for free ;) We talk of SMALL here - basically covery my traffic costs. Not sure how large the archive will be - ask me in 2 weeks. Anyone else is free to put that archive up on peer to peer - I even may do so, but with a very limited bandwidth ;)

 

The file format will be tab separated windows text file (CRLF delimiters), naturally compresed, most likely with 7zip ;) Expect SIGNIFICANT Archives. We talk of 30.000+ lines PER SECOND in the the hot phases. Even during the night, often, the lines are not readable on a text output, because every trade results in a lot of bid adjustments. It will contain two file sets - one with the exchange prints, one with instrument metadata (name, allowed prices etc.). I may actually have to split the allowed prices out into a third file for technical reasons.

C# code for parsing the files will be available, unless trivial (so basically for the print files, where not all lines will be identical).

At a later stage I will possbily also have a binary format available - part of the data capture approach is in order to actually find out the possible range of values for some items I get text encoded so I can make a proper binary representation that is efficient - I plan on storing that complete data stream in a server for real time retrieval.

 

The offer is one time. I have no intention with this in getting in as data provider. I just think it may be valuable for someone working on his own technology to have access to a high end stream to see what really goes on - and possibly stress test his technology. And other sources of that data are - hm - hard to find or hard to pay ;)

Share this post


Link to post
Share on other sites

Ok, status update ;)

 

Change of hearts. Well, not really.

 

* Data collector running since yesterday around 1900 UTC (for those now knowledgable: That is Greenwhich Mean Time WITHOUT summer/winter time - computers use that internally).

* The log file so far has the form like

 

009-08-24 04:26:24.149830 ZDU9 CME Bid 0 9534

009-08-24 04:26:24.149830 ZDU9 CME Bid 10 9533

009-08-24 04:26:24.150741 DDU9 CME Ask 2 9572

009-08-24 04:26:24.150741 DDU9 CME Bid 0 9530

009-08-24 04:26:24.150741 DDU9 CME Bid 0 9547

009-08-24 04:26:24.150741 DDU9 CME Bid 2 9546

009-08-24 04:26:24.154835 ZDU9 CME Bid 2 9539

009-08-24 04:26:24.154835 ZDU9 CME Bid 12 9538

009-08-24 04:26:24.154983 DDU9 CME Bid 3 9529

009-08-24 04:26:24.155119 YMU9 CME Bid 4 9546

009-08-24 04:26:24.155385 ZDU9 CME Bid 0 9547

009-08-24 04:26:24.155385 ZDU9 CME Bid 1 9528

009-08-24 04:26:24.157930 DDU9 CME Bid 0 9548

009-08-24 04:26:24.157930 DDU9 CME Bid 2 9547

009-08-24 04:26:24.157930 DDU9 CME BestBid 2 9547

009-08-24 04:26:24.158331 YMZ9 CME Bid 2 9487

009-08-24 04:26:24.158331 YMZ9 CME Bid 3 9486

009-08-24 04:26:24.158331 YMZ9 CME Bid 0 9384

009-08-24 04:26:24.158331 YMZ9 CME Bid 1 9487

009-08-24 04:26:24.158331 YMZ9 CME BestBid 1 9487

009-08-24 04:26:24.158483 ZDU9 CME Bid 0 9548

009-08-24 04:26:24.158483 ZDU9 CME Bid 2 9547

009-08-24 04:26:24.164320 DDU9 CME Bid 0 9529

009-08-24 04:26:24.172657 ESU9 CME Bid 91 1030.75

009-08-24 04:26:24.172657 ESU9 CME Bid 100 1030.5

009-08-24 04:26:24.175702 ESU9 CME Bid 76 1029.75

009-08-24 04:26:24.175702 ESU9 CME Bid 128 1029.5

009-08-24 04:26:24.179895 YMZ9 CME Bid 0 9482

009-08-24 04:26:24.179895 YMZ9 CME Bid 10 9480

009-08-24 04:26:24.183040 ZDH0 CME Bid 0 9389

009-08-24 04:26:24.183040 ZDH0 CME Bid 5 9386

009-08-24 04:26:24.183040 ZDH0 CME BestBid 5 9386

009-08-24 04:26:24.199314 ZDZ9 CME Bid 0 9483

009-08-24 04:26:24.199314 ZDZ9 CME Bid 1 9481

009-08-24 04:26:24.199314 ZDZ9 CME BestBid 1 9481

009-08-24 04:26:24.200362 6JU9 CME Ask 21 0.01055

009-08-24 04:26:24.200362 6JU9 CME BestAsk 21 0.01055

009-08-24 04:26:24.200463 6JU9 CME Bid 18 0.010548

009-08-24 04:26:24.200463 6JU9 CME BestBid 18 0.010548

009-08-24 04:26:24.220230 6JU9 CME Ask 1 0.010549

009-08-24 04:26:24.220230 6JU9 CME Ask 0 0.010554

009-08-24 04:26:24.220230 6JU9 CME Ask 83 0.010552

009-08-24 04:26:24.220230 6JU9 CME BestAsk 1 0.010549

 

It is slightly bad: the hour is in 12 hour format instead 24. Not too bad for my purposes though. Main problem, though: it is already around 800mmb big ;) I wont have the space for a whole week on the particular drive.

 

I will stop collecting at the end of the day when the ES has closed or tomorrow - depending on disc usage today. I will then finish creating the other files needed tomorrow (mostly the descriptions and pricing steps) and make the archive available.

 

I THEN go on with a binary representation, based most likely on delta storage (after all, things do not change that much, so I can but the price in one byte easily most of the time, between prints in one symbol). The idea is to store information "per symbol" in time slices of maybe 5 minutes (or more, depends on the amount of data I reallly get) in binary fields ;). And will see what loading that into my SQL Server says... (which has a lot more space than the drive set aside NOW for the log - I seriously did not expect THAT much data). The textual log file is simply waaaaayyyy tooo large. I will possibly reduce the granularity on the timestamp to about MS resolution. May be less. I do not really see a need for a better granularity than about 25ms (which is what NxCore uses, too).

 

I will most likely start collecting again next week (I actually want to make full collection for my own purposes starting 1st of September), but I may filter the data - I simply do not need data on virtual instruments (suchj as spreads) And have not exactly a large need for options, so I may filter out the order book there, keeping only best bid and ask ;)

 

For those interested, btw: CPU utilization on my collecting station is really low so far, and network IO is around 1.6 megabyte per minute. That is around 200kbit ;) I Post another update after market open - that is when it gets interesting.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.