4. API Reference

4.1. Data Fetchers

Stock Data

factorset.Run.data_fetch.data_fetch()[source]

从config中读取配置,爬取行情,基本面,及其他数据。

factorset.data.StockSaver.write_all_stock(allAshare, lib=None)[source]
Parameters:
  • allAshare – List,所有股票
  • lib – arctic.store.version_store.VersionStore
Returns:

succ: List, written stocks; fail: List, failed written stocks

factorset.data.StockSaver.save_index(symbol)[source]

从Tushare取指数行情,000905 中证500,000300 沪深300

Parameters:symbol – 指数的代码

Other Data

factorset.data.OtherData.market_value(dir, tickers)[source]

总市值的读取与计算

Parameters:
  • dir – str, 其他数据的储存路径
  • tickers – list, 股票代码
Returns:

pd.DataFrame: 总市值

factorset.data.OtherData.tradecal(startday=None, endday=None)[source]

交易日历

Parameters:
  • startday – 默认为‘2017-06-15’
  • endday – 默认为最近交易日
Returns:

list,交易日历

Fundamental Data

class factorset.data.FundCrawler.FundCrawler(TYPE)[source]

FundCrawler类,协程爬取基本面数据

consume(queue)[source]

消费直到任务结束

Parameters:queue – ticker 队列
Returns:None
data_clean(text)[source]

text数据清洗

Parameters:text – 协程爬取的text数据
Returns:pd.DataFrame
fetch(queue, session, url, ticker)[source]

单个ticker基本面爬取

Parameters:
  • queue – ticker 队列
  • session – aiohttp.ClientSession()
  • url – 股票基本面爬取地址
  • ticker – 股票代码
Returns:

基本面数据text

get_proxy()[source]

获取proxy

Returns:proxy地址
main(Ashare, num=10, retry=2)[source]

协程爬取主程序

Parameters:
  • Ashare (list) – 带爬取tickers
  • num (int) – 最大协程数
  • retry (int) – 重启次数
Returns:

None

run(queue, max_tasks)[source]

Schedule the consumer

Parameters:
  • queue – ticker 队列
  • max_tasks – 最大协程数
Returns:

None

write_to_MongoDB(symbol, df, source='Tushare')[source]
Parameters:
  • symbol – ticker
  • df – 单个ticker基本面数据,pd.DataFrame
  • source – 注释表明来源,str,默认为’Tushare’
Returns:

None

4.2. Data Reader

The following methods are available for use in the prepare_data (recommended), generate_factor API functions.

Stock Data

factorset.data.CSVParser.all_stock_symbol(dir)[source]
Parameters:dir (string) – 数据路径
Returns:路径下所有股票tickers
factorset.data.CSVParser.read_stock(dir, ticker)[source]
Parameters:
  • dir (string) – 数据路径
  • ticker – 单个股票ticker
Returns:

单个股票行情, pd.DataFrame

factorset.data.CSVParser.concat_stock(dir, tickers)[source]
纵向合并目录指定股票行情
Parameters:
  • dir (string) – 数据路径
  • tickers – 股票tickers, list
Return type:

pd.DataFrame

factorset.data.CSVParser.concat_all_stock(dir)[source]

纵向合并目录所有股票行情

Parameters:dir (string) – 数据路径
Returns:pd.DataFrame
factorset.data.CSVParser.hconcat_stock_series(hq, tickers)[source]

横向合并股票行情

Parameters:
  • hq (pd.DataFrame) – concat_all_stock后的DataFrame
  • tickers (list) – 股票tickers, list
Return type:

pd.DataFrame

Other Data

factorset.data.OtherData.write_new_stocks()[source]

从Tushare取每日新股数据,因Tushare数据限制,最多取到2016-04-26

Returns:None
factorset.data.OtherData.write_all_date(tc, lib=None)[source]
Parameters:
  • tc – List,所有日期
  • lib – arctic.store.version_store.VersionStore
Returns:

succ: List, written stocks;

Returns:

fail: List, failed written stocks

Fundamental Data

factorset.data.CSVParser.all_fund_symbol(dir, type)[source]

获取储存路径中一种报表的所有tickers

Parameters:
  • dir (string) – 数据路径
  • type – BS’,’IS’,’CF’
Returns:

tickers

Return type:

list

factorset.data.CSVParser.read_fund(dir, type, ticker)[source]

读取一个股票的一种报表数据

Parameters:
  • dir (string) – 数据路径,string
  • type – BS’,’IS’,’CF’
  • ticker – 股票ticker, str
Return type:

pd.DataFrame

factorset.data.CSVParser.fund_collist(dir, type)[source]

一种报表所有股票的会计项目

Parameters:
  • dir (string) – 数据路径
  • type – BS’,’IS’,’CF’
Return type:

list

factorset.data.CSVParser.concat_fund(dir, tickers, type)[source]

纵向合并一种财务报表

Parameters:
  • dir (string) – 数据路径
  • tickers – 股票tickers, list
  • type – BS’,’IS’,’CF’
Return type:

pd.DataFrame

4.3. Data Util

factorset.data.OtherData.code_to_symbol(code)[source]

生成symbol代码标志

Parameters:code – 数字
Returns:str,股票代码
factorset.data.OtherData.shift_date(date_str, n)[source]
Parameters:
  • date_str – 日期, ‘YYYYMMDD’格式的字符串
  • n – 时间跨度, int
Returns:

调整后的交易日,date

factorset.Util.finance.ttmContinues(report_df, label)[source]

Compute Trailing Twelve Months for multiple indicator.

computation rules:
  1. ttm indicator is computed on announcement date.
  2. on given release_date, use the latest report_date and the previous report year for computation.
  3. if any report period is missing, use weighted method.
  4. if two reports (usually first-quoter and annual) are released together, only keep latest
Parameters:
  • report_df (Pandas.DataFrame) – must have ‘report_date’, ‘release_date’, and <label> columns
  • label (str.) – column name for intended indicator
Returns:

columned by [‘datetime’, ‘report_date’, <label>+’_TTM’, …]

Return type:

Pandas.DataFrame

Todo

if announce_date exist, use announce_date instead of release_date, report_date as well

factorset.Util.finance.ttmDiscrete(report_df, label_str, min_report_num=4)[source]
Parameters:
  • report_df (Pandas.DataFrame) – must have ‘report_date’, ‘release_date’, and <label> columns
  • label_str
  • min_report_num (int) –
Returns:

columned by [‘datetime’, ‘report_date’, <label>+’_TTM’, …]

Return type:

pd.DataFrame