2. Data Preparation

The Data Fetchers is a crawler module of pricing data, fundamental data and other market data like market value. Fetchers allow us to crawl all of the data we will need from Tushare or other sources.

2.1. Proxy Pool

To get fundamental Data, we need set up a proxy pool for our Fundcrawler.

Some recommended proxy pool project:

dungproxy https://github.com/virjar/dungproxy
proxyspider https://github.com/zhangchenchen/proxyspider
ProxyPool https://github.com/henson/ProxyPool
ProxyPool https://github.com/WiseDoge/ProxyPool
IPProxyTool https://github.com/awolfly9/IPProxyTool
IPProxyPool https://github.com/qiyeboy/IPProxyPool
proxy_list https://github.com/gavin66/proxy_list
proxy_pool https://github.com/lujqme/proxy_pool
haipproxy https://github.com/SpiderClub/haipproxy
proxy_pool https://github.com/jhao104/proxy_pool

Note

factorset uses proxy_pool, see in Proxy_start. You may also install redis for your proxy pool.

2.2. Configuration file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[TARGET]
;all = all
;all = hs300
all = 000001.SZ, 000002.SZ

[OPTIONS]
MONGO = False
CSV = True
;succ and fail list of symbol
SFL = True

[STORE]
hqdir = ./hq
;Only fund crawler will use proxy_pool
funddir = ./fund
otherdir =  ./other

[ARCTIC]
;host = 127.0.0.1:27017
host = localhost

[READ]
proxypool = 127.0.0.1:5010
proxymin = 5
encode = gbk

2.3. Run Fetcher

1
2
from factorset.Run import data_fetch
data_fetch.data_fetch()
Start Fetching Stock & Index Data!
000001.SZ写入完成
000002.SZ写入完成
Finish Fetching Stock & Index Data!

Start Fetching Other Data!
2017-06-15写入完成
2017-06-16写入完成
2017-06-19写入完成
2017-06-20写入完成
2017-06-21写入完成
2017-06-22写入完成
....
[Getting data:]##################
Finish Fetching Other Data!

Start Fetching Fundamental Data!
proxy: http://xxxx
proxy: http://xxxx
Put 000001.SZ in queue!
proxy: http://xxxx
Put 000001.SZ in queue!
proxy: http://xxxx
000002.SZ写入84条数据,
000001.SZ写入88条数据,
BS表数据导入成功!

proxy: http://xxxx
proxy: http://xxxx
Cannot connect to host xxxx ssl:False [Connect call failed ('xxxx', xxxx)]
Put 000001.SZ in queue!
proxy: http://xxxx
Put 000001.SZ in queue!
proxy: http://xxxx
000002.SZ写入85条数据,
000001.SZ写入91条数据,
IS表数据导入成功!

proxy: http://xxxx
proxy: http://xxxx
Put 000001.SZ in queue!
Put 000002.SZ in queue!
proxy: http://xxxx
proxy: http://xxxx
000001.SZ写入72条数据,
000002.SZ写入71条数据,
CF表数据导入成功!

Finish Fetching Fundamental Data!

Note

If the number of proxies in your proxy pool is less than proxy min you set in the CONFIG factorset uses proxy_pool, see in Proxy_start. You may also install redis for your proxy pool.

1
2
3
import os
from factorset.data import CSVParser as cp
cp.concat_fund(os.path.abspath('.'), cp.all_fund_symbol(os.path.abspath('.'), 'BS'), 'BS').loc[:,['ticker', 121, 101, 68, 109, 0]]
dateticker121101681090
2018/3/31000002.SZ1.22E+128.81E+113.42E+097.80E+094.11E+10
2017/12/31000002.SZ1.17E+128.47E+113.33E+091.61E+104.62E+10
2017/9/30000002.SZ1.02E+127.43E+112.79E+091.73E+103.69E+10
2017/6/30000002.SZ9.29E+116.76E+112.22E+091.34E+103.67E+10
2017/3/31000002.SZ8.87E+116.28E+112.63E+091.59E+103.25E+10
2016/12/31000002.SZ8.31E+115.80E+113.60E+091.66E+102.68E+10
2016/9/30000002.SZ7.56E+115.53E+115.35E+096.84E+092.41E+10
2016/6/30000002.SZ7.12E+115.10E+118.34E+092.85E+092.64E+10
2016/3/31000002.SZ6.59E+114.57E+111.15E+103.30E+092.37E+10
2015/12/31000002.SZ6.11E+114.20E+111.67E+101.90E+092.47E+10
2015/9/30000002.SZ5.71E+114.02E+111.83E+107.63E+082.34E+10
2015/6/30000002.SZ5.37E+113.79E+112.16E+104.77E+082.33E+10
2015/3/31000002.SZ5.26E+113.69E+112.47E+101.02E+092.26E+10
2014/12/31000002.SZ5.08E+113.46E+112.13E+102.38E+092.04E+10
2014/9/30000002.SZ5.20E+113.56E+112.13E+106.03E+091.38E+10
2014/6/30000002.SZ5.02E+113.36E+111.66E+106.58E+091.57E+10
2014/3/31000002.SZ4.95E+113.41E+111.40E+104.38E+092.35E+10
2013/12/31000002.SZ4.79E+113.29E+111.48E+105.10E+092.75E+10
2013/9/30000002.SZ4.61E+113.30E+111.32E+101.25E+102.64E+10
2013/6/30000002.SZ4.32E+113.14E+111.09E+101.20E+103.29E+10
2013/3/31000002.SZ4.18E+112.99E+117.87E+091.31E+103.10E+10
2012/12/31000002.SZ3.79E+112.60E+114.98E+099.93E+092.56E+10
2012/9/30000002.SZ3.48E+112.34E+111.69E+094.48E+091.41E+10
2012/6/30000002.SZ3.30E+112.17E+112.72E+084.80E+091.55E+10
2012/3/31000002.SZ3.10E+112.03E+110.00E+002.04E+091.58E+10
2011/12/31000002.SZ2.96E+112.01E+113.13E+071.72E+092.18E+10
2011/9/30000002.SZ2.83E+111.95E+113.13E+079.21E+082.28E+10
2011/6/30000002.SZ2.61E+111.73E+110.00E+001.61E+092.14E+10
2011/3/31000002.SZ2.35E+111.52E+110.00E+001.72E+091.86E+10
2010/12/31000002.SZ2.16E+111.30E+110.00E+001.48E+091.53E+10
..................
2004/9/30000002.SZ1.58E+107.06E+090.00E+002.81E+096.00E+07
2004/6/30000002.SZ1.32E+106.43E+094.00E+063.24E+091.60E+08
2004/3/31000002.SZ1.18E+105.76E+093.15E+090.00E+001.60E+08
2003/12/31000002.SZ1.06E+104.80E+093.00E+061.68E+091.60E+08
2003/9/30000002.SZ1.02E+104.88E+099.72E+061.56E+090.00E+00
2003/6/30000002.SZ9.47E+094.17E+099.69E+061.04E+090.00E+00
2003/3/31000002.SZ8.32E+093.14E+099.69E+065.20E+080.00E+00
2002/12/31000002.SZ8.22E+093.08E+099.69E+064.60E+080.00E+00
2002/9/30000002.SZ8.21E+093.18E+095.75E+087.60E+080.00E+00
2002/6/30000002.SZ8.22E+093.27E+090.00E+001.12E+090.00E+00
2002/3/31000002.SZ6.60E+093.23E+090.00E+001.50E+090.00E+00
2001/12/31000002.SZ6.48E+093.05E+090.00E+001.35E+090.00E+00
2001/6/30000002.SZ6.07E+092.73E+090.00E+009.86E+080.00E+00
2000/12/31000002.SZ5.58E+092.53E+090.00E+005.66E+080.00E+00
2000/6/30000002.SZ5.05E+092.06E+090.00E+006.96E+082.00E+07
1999/12/31000002.SZ4.49E+092.29E+090.00E+008.95E+080.00E+00
1999/6/30000002.SZ4.34E+092.11E+090.00E+008.15E+080.00E+00
1998/12/31000002.SZ4.04E+091.92E+090.00E+006.19E+080.00E+00
1998/6/30000002.SZ3.95E+091.93E+090.00E+005.74E+080.00E+00
1997/12/31000002.SZ3.96E+091.97E+090.00E+005.48E+080.00E+00
1997/6/30000002.SZ3.62E+092.04E+090.00E+000.00E+000.00E+00
1996/12/31000002.SZ3.47E+091.94E+090.00E+008.28E+084.05E+07
1996/6/30000002.SZ3.22E+091.66E+090.00E+000.00E+000.00E+00
1995/12/31000002.SZ3.23E+091.75E+090.00E+006.47E+083.01E+07
1995/6/30000002.SZ2.83E+091.57E+090.00E+000.00E+000.00E+00
1994/12/31000002.SZ2.68E+091.44E+090.00E+004.08E+082.49E+05
1994/6/30000002.SZ2.24E+091.19E+090.00E+000.00E+000.00E+00
1993/12/31000002.SZ2.14E+091.14E+090.00E+003.37E+080.00E+00
1992/12/31000002.SZ9.63E+087.05E+080.00E+001.97E+080.00E+00
1970/1/1000002.SZ0.00E+000.00E+000.00E+000.00E+000.00E+00

2.4. Fund Dict

Balance sheet

Income Statement

Cash Flow Statement