Author Topic: [WorkerProposal]Data solution:data structured,flexible API and fast-built method  (Read 639 times)

0 Members and 1 Guest are viewing this topic.

Offline vianull

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
  • BitShares: vianull
BTS Worker 提案 - 基于 Postgresql 的数据解决方案:结构化、API与快速构建

BTS worker proposal of Postgresql-based data solution: data structured, flexible API and fast-built method



背景

      交易数据的公开透明是区块链的一个重要优势。随着Bitshares区块数据的增长,链上的 Operation 数量已达数亿条,由于交易机器人的存在,仍以极快的速度增长中。在此背景下,对于普通用户,链上数据的公开透明、不可篡改、可查可追溯变得越来越仅存在于理论中。但是实际上,历史交易分析、交易数据、地址活跃度等等指标对于用户交易决策、判断资产健康情况有极大的帮助。

      2017年以来,Bitshares 用户持续增长,也不断出现优秀的区块链浏览器。但是,目前的区块链浏览器仅仅停留在最初级层面,缺少对账户全量历史的呈现; 能够呈现的数据,多数依赖底层 API,缺少数据分析;作为一个去中心化交易所,对历史的交易信息的分析尤为缺少。

    以上问题的出现有着技术上更深层次的原因:

    1)Bitshares 数据增长太快,缺乏结构化整理
    2)目前多数数据都依赖从重钱包获取,缺少结合具体需求、更灵活的 API
    3)目前区块的同步 replay 时间过长,在常规配置与网络带宽下,动辄40小时以上已成常态

Introduce

It’s undebatable that transparency of transaction data is one of the most significant advantages in blockchain technology. Recently, the number of operations on the chain has reached hundreds of millions. Due to the existence of trading robots, data is still growing at an extremely fast rate.  Even though, for non-geeks, the transparency, immutability, and traceability of on-chain data are only theoretical. Indeed, some indices, such as transaction data, the activity of addresses are very important and helpful to make logical trading decisions and to evaluate the health of assets.

Since 2017, number of Bitshares users have continued to increase. A lot of blockchain browsers are completed one by one. However, current blockchain browsers lack of full history query for each account. Most of the data shown on browsers completely relies on the full node API, which means it can not meet in-depth data analysis requirements. Furthermore, as a decentralized exchange, there is no historical transactions analyze at all.

These problems are closely related to the following status:

    1) BTS data increases fast and simultaneously, unstructured.
    2) Currently, most of  useful data comes from self-built heavy wallets, these APIs don’t support multi-dimensional and customized query.
    3) The replay time of full node is too long. Considering a normal configuration and network bandwidth, it takes more than 40 hours.


目标

    针对以上问题,我们希望构建一个基于 Postgresql 的数据解决方案,涵盖:
    1)开发基于 Postgresql 插件,使用 Postgresql 对现有区块数据做结构化存储。
    2)开发一套更贴近用户需求的 API,实现更深层次、更定制化的查询。
    3)提供一套重钱包快速构建服务,希望把重钱包+上述 API 的构建,压缩在3个小时以内(在常见云服务器的场景中)。

Goal

    To solve above problems, we are going to build a Postgresql-based data solution including:
    1) Develop a Postgresql-based plugin. We are going to build a structured on-chain data storage  using Postgresql.
    2) Develop APIs based on user’s viewpoint. We are going to realize more powerful and customized queries.
    3) Provide a full node fast-built service. We hope to compress the construction time of heavy wallet + above API into less than 3 hours (using regular cloud service).


技术现状/ 技术调研

针对上述提出的3个目标,我们对社区里已有的技术方案进行了分析:

a、ElasticSearch Plugin
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin
是一个非常好的扩展,对 BTS上的历史交易与object做了非常全面的非结构化存储与索引,也给了我们很大的启发与经验。
但是不足之处在于

1)略显笨重,对于服务器依然要求极高的性能;
2)同步时间太久,缺少快速构建的方案,目前社区使用ES Plugin 来进行数据的并不多
3)对于某些内容(如成交的相关信息)并未进行解析直接存储,无法满足一些定制化查询的需要。

b、python wrapper
https://github.com/oxarbitrage/bitshares-explorer-api
提供了非常丰富的查询接口,后端依赖 ElasticSearch 与自建的部分数据。不过美中不足的是,其数据用定时的方式导入,导致数据并不实时(每天更新),且随着数据的进一步增长,每次导入的时间更久,可能会显露出问题。

综上: 社区里已经有一些非常成熟的项目给我们提供了宝贵的经验,但是目前方案仍有部分问题待解决。


Current Technical Status

We have analyzed the existing technical solutions in BTS community:

a. ElasticSearch Plugin
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin
ES has been a good plugin. It has provided a very comprehensive unstructured storage and an indexing of historical transactions and objects on BTS. By the way, ES gives us a lot of inspiration.
But there are some shortages,

1) Cumbersome. ES requires extremely high performance for the server;
2) The synchronization time is too long, lacking of a fast-built solution. Currently, as far as we know,  the usage of ES Plugin for data query or data analysis is limited.
3) Some content (such as related information of the transaction) is not structured. Indeed, ES stores them directly, which cannot satisfy some customized queries.

b. python wrapper
https://github.com/oxarbitrage/bitshares-explorer-api
Python wrapper has provided a good API, its backend relies on ElasticSearch and some self-built data. However, the data is imported in a timed manner, which means that data is not real-time (it updated every day). As the data increasing, each import will take more time.

Summary:There have been some very mature projects in the community that  provides us with valuable experience, but there are still a lot of problems  to be solved in the current programs.


我们的方案:

1、使用 C++开发 Postgresql Plugin,对链上信息进行结构化存储。 对于特别分析需求的数据,进行单独提取,并存入对应的表内。
2、使用 Ruby on Rails 开发一套API, 除了常规查询需求外,另对十余种深层次分析的需求进行接口支持。(例子见下文)
3、实现一套备份方案,并开发快速构建服务。可让用户在常规的云服务器(如 AWS ,DigitalOcean 等)环境,在3小时内构建 BTS 重钱包与上述 API。
4、整理目前BTS上的深层次数据分析需求,整理数据结构,实现demo ,并在 bts.ai 上开放。

Our plan:

1. Develop Postgresql Plugin in C++ to store structured on-chain data. We will reserve structured data for customized requirements into corresponding table.
2. Develop API using Ruby on Rails. In addition to the general query requirements, the API will support more than 10 specific data queries and analysis. (See below as examples)
3. Implement a backup and fast-build solution. The solution will allow users to build BTS heavy wallets + APIs in 3 hours using a regular cloud service (eg AWS, DigitalOcean, etc.).
4. Collect the data analysis requirements in BTS community. We will reorganize the data structure using the new API, implement the demo, and open it on bts.ai.


1、数据呈现细节

1.1、支持外部交易所BTS数据(仅支持提供api的交易所)
      区块链是一个去中心化的全球设施,但是目前大部分虚拟货币交易都在交易所中产生,这就导致交易数据极为分散,难以判断虚拟货币的真实交易数。通过导入外部交易所BTS交易对数据,新的BTS.ai可以成为一个可视化的数据汇总分析平台,我们将开发如下功能:

1.1. Support BTS data from centralized exchange (exchanges that provide data api)
The blockchain is a decentralized global facility, but currently, most of the virtual currency transactions are generated on centralized exchanges, which results in highly diversified transaction data.  It is very difficult to evaluate the actual volume of transactions per day. By importing the CEX BTS transaction data, the new BTS.ai can develop to be a powerful visual data analysis website. We will add the following functions:

a、各大交易所及内盘每日交易量汇总
a. Summary of daily trading volume on major CEXs and DEX

上图中以一些交易所为例,我们期望用堆叠柱状图显示不同平台的BTS的日交易量,这样普通用户可以分析出各个平台的日交易量和BTS总交易量之间的关系。

In the above figure, we hope to show the daily trading volume of BTSs on different exchanges by the stacked histogram.  Ordinary users can analyze the daily trading volume at each CEXs and DEX. The whole trading volume is shown as well.

b、各大交易所及内盘交易对活跃度 (每秒交易数)
b. Trading activeness on major CEXs and DEX (number of transactions per second)

同样,上图中不同平台中每秒交易数可以反映出每天的交易活跃度,有利于普通用户选择交易所进行交易,同时也有利于分析出不同平台用户的交易喜好。

The above figure can show the relationship between number of transactions and time. It implies the daily trading activeness in each exchange, which is beneficial for ordinary users. Users can select exchange based on this chart. This figure is very helpful to analyze trading habits of different exchange users.

c、各大交易所及内盘历史价格展示
c. Historical price on major CEXs and DEX

如上图所示,不同交易所BTS的价格对比可以显示出平台的价差和用户喜好,同样有利于普通用户选择交易平台。

As shown in the above figure, the BTS price comparison of different exchanges can indicate user habits and lowest price, which is beneficial for ordinary users.

d、各大交易所及内盘买卖单深度汇总图
d. Collection of trading depth on major CEXs and DEX

我们计划合并不同平台中的买卖深度,这样可以很容易的看出当前市场BTS的交易总深度,有利于大额交易和普通用户的交易判断、平台分析。

We will combine the market depth on major CEXs and DEX, so that it is easy to show the total depth of the current BTS market. Large value order may be benefited from this figure. Trading analysis can be done according to this figure as well.

1.2、资产相关页面升级
1.2. Asset related page upgrade

智能资产和自定义资产是BTS的优势功能,但是目前存在垃圾资产过多,交易对过多,没有合适数据展示方式等问题,普通用户很难靠自己选择优质资产。更新后的资产相关页面将会增加如下功能:

Smart assets and customized assets are the advantages of BTS, but there are too many useless assets and trade markets. However, data visualization methods are very few. It is difficult for ordinary users to find qualified assets by themself. The updated asset related webpage will add the following features:

a、智能资产供应量vs 时间图
a. Chart of smart asset supply vs time

智能资产供应量随时间的变化程度会反应用户抵押情绪,而其它智能资产的供应量(比如bitcny和bitusd) 会对BTS社区的投票和策略产生极大影响,它与历史数据的对比对于交易本身也非常有意义。

The above chart will show the user's debt sentiment. Furthermore, the supply of other smart assets (such as bitcny and bitusd) will impact the voting and strategy from BTS community. The comparison with historical data is very useful on sell/buy decision.

b、支持Bitcny,Bitusd的爆仓价格vs抵押资产数目散点图
b.  Scatter chart on Bitcny, Bitusd's call price vs. number of debt asset

爆仓价格vs抵押数目散点图可以很显著的看出普通用户的抵押情绪,也就是当前的社区情绪,并且可以充分体现大玩家的带头作用,这项指标同样对社区投票和情绪有极大影响。

Scatter chart of Bitcny, Bitusd's call price vs. number of debted BTS can show the community’s debt sentiment. It also relates to the voting and strategy from BTS community.

c、支持活跃地址数(当天参与交易地址数) vs 时间图
                    休眠地址数(30天未交易) vs 时间图
                    新增持仓地址数 vs 时间图
c. Chart of active address number (the number of participating trading addresses per day) vs time
                     number of dormant addresses (30 days not traded) vs time
                     number of new users vs time

参与交易地址数的活跃与否反映了内盘用户对交易的热情程度,一定程度上也反映了市场对当前价格的活跃程度和社区人数,休眠地址数,新增持仓地址数同样也是非常重要的指标。

The number of active addresses reflects the trading enthusiasm of the DEX user. It also reflects whether current prices are acceptable. The number of position addresses and the number of new address are important indicators as well .

d、更新持仓地址排序
d. Ranking of position address

持仓地址是衡量资产分布非常重要的一个指标,中心化程度过高的资产有潜在的操盘风险。新版bts.ai将会更新并美化各种资产的持仓地址和一些账户对此资产的操作情况。

We well upgrade the rank of position address as well. New bts.ai will have  a beautiful and powerful webpage to show the address and their operations.

e、支持展示智能资产抵押量,借出量等
e. Indicators on smart asserts

对智能资产抵押量,借出量等数据分析同样有利于多维度分析智能资产的抵押用户和目标用户,以及智能资产健康情况。

We well show the number of bitcny, the debt number of bts, average number of bts vs bitcny and median of bts vs bitcny.

f、页面展示其他数据
f. Other parameters

在资产页面,我们还计划展示当日单笔最大交易,当日交易数等等数据,方便社区用户查询。
On the asset webpage, we also plan to show the largest single transaction per day, the number of transactions per day etc.

1.3、交易对相关页面升级
1.3.Trading markets related pages upgrade

在交易对原有数据基础上,我们准备添加如下功能:
We will add the following functions:

a、显示当前交易对(日交易量)
a. Trading volume per day of the current market

b、与昨日相比的变化(交易量,最大交易,交易地址数)
b. Comparison with last day (Volume, Maximum transaction value, Number of active address)



2、数据定制化查询与导出

2.1、内盘和链上数据查询功能增强和数据导出
现有的bts.ai中的查询功还不能满足专业用户需求,历史记录搜索只能人工翻页,因此目前依然需要调整,新网站将会加入如下功能:
      a、提供按照类别筛选用户记录功能
      用户的链上操作记录有很多种不同类型,例如挂单,交易撮合,转账,投票等。目前网站中没有针对不同操作类型的查询功能,而这种功能对用户来讲非常有用。我们计划添加此项功能。
      b、提供根据时间区间,交易对等关键字查询历史交易记录功能
      历史交易搜索目前只能支持手工翻页,但是这种操作对用户来讲过于繁琐,而且不支持大数据量时的查询。因此,我们准备添加通过不同关键字定位并搜索历史交易记录的选项。
      c、提供查询数据导出功能
      现有的数据只能在网页上搜索,一些用户希望能够通过EXCEL等软件导出数据,并以此为基础进行分析。我们计划添加查询数据导出功能,方便此类社区用户。
      基于该服务,我们或用户就可以基于 BTS 的成交数据做更深层次的交易分析。 包括,某时间段内的 交易概况; 某账号某资产的交易均价; 某账号的盈亏分析等等

2. Multi-dimensional query and data export

2.1. Data query enhancement and data export on DEX and on-chain data
The existing query functions in bts.ai can not overcome professional requirements. The transaction history can only be searched by page number manually. The new website will add the following functions:
      a. Operation type based data query
      There are many different types of user operation that is recorded on chain, such as pending orders, transaction matching, voting operation, and so on. Now the website lacks of the query using operation types. We will add this function.
      b. Time interval, assert, keyword based data query
      Historical transaction search currently only supports manual page flipping, but this operation is too cumbersome for users and does not support queries when large number of results are available. Therefore, we are going to add the time interval, assert name, keyword based data query.
      c. Data export
      Existing data only available on webpages. Some users may want to analysis data through software such as EXCEL. We plan to add  data export to make data exportation easier.
      Based on this service, we or other users can realize in-depth data analysis including the overview of the transaction within a certain period of time; the average transaction price of an account or assert; the profit and loss analysis of an account, etc.


3、数据备份服务:

我们会定时提供 BTS 节点的全量数据备份(暂只支持Ubuntu) ,帮助专业用户与开发者快速的构建 BTS 全节点。根据测试,可提升数据同步效率80%以上 。
该服务会同步部署与 AWS S3 与国内云服务上,方便用户高速的下载,会增强 BTS 的容灾恢复能力。

3. Data backup service:

We will provide backup of full BTS nodes (only Ubuntu is supported) to help professional users and developers building BTS nodes quickly. According to the test, the speed of data synchronization can be improved by more than 80%.
The service will be deployed with AWS S3 and cloud services in mainland China, which will enhance the disaster recovery capability of the BTS . This service is help users downloading data at high speed.


4、账户识别与分类

目前BTS网络中存在着不同的用户,有很多机器人,同样有一些是交易所账户,因此我们希望能够尝试在网站上标识这些用户,这些用户标识将会在用户显示、资产排序等页面进行显示,这样可以帮助一些跟单的普通用户做参考。
同时,也有一部分黑名单的钓鱼账号,也会在浏览器内进行显著标出,降低用户被钓鱼风险。

4. Address identification and classification

Recently there are various users in BTS network,  including robots and some exchange addresses. Therefore, we hope to identify these users on new website. These user labels will be displayed on the user page, asset sorting and other pages. The labels can help ordinary users to make decision.
Meanwhile, there are also some blacklisted accounts that will be marked prominently in the browser to reduce the risk of users being cheated.

5、见证人的喂价行为分析、接入点的历史可用分析


       a、见证人喂价平均更新速度
            见证人喂价更新速度是一些贴线玩家的必要参考数据,我们计划将这项数据作为表中的一个字段展示在页面上。
       b、见证人喂价偏离真实喂价曲线

5.Analysis of Witness's Feeding Behavior and Analysis of Access Points History

       a. Witness average update rate
           Update speed of witness feed price  is a necessary reference for users, we plan to display this data as a field on the page.
       b. A chart about how far witness feed price  deviates from the system feed price


见证人的喂价目前几乎没有可视化的监控,而喂价又是BTS经济生态中极为重要的一环,我们计划可视化各个见证人的喂价,和距离平均喂价的成都,以作为参考。

There is almost no visual monitoring  on the witness's feeding price. However, the feeding price is an extremely important part of the BTS economic system. We plan to visualize the feeding price of each witness and the average feeding price as a reference.

      c、各节点的历史可用分析,及当前可用情况
      c. Historical behaviour analysis of each access node, and current availability

6、其他

结合社区意见,适当添加其他功能。

6. Others

We will add some other functions according to comments from the community.


以上所有的数据的底层数据,都会由“我们的方案”章节中的开源项目支持。同时,以上具体功能可能会随着社区意见和开发进展调整。

All above underlying data will be supported by our open-source project. The above functions may be modified as community comments and development progress.



负责团队: Bts.ai 团队

我们已经是一个较为成熟的团队,主要开发人员有着丰富网站开发经验,同时将在未来雇佣更多开发人员以加快开发速度。
联系我们:https://t.me/btsai

Our team: Bts.ai team

We have been a mature team; the major developers own a lot of experience on website development.  We will hire more developers in the future to speed up the development.
Project Coordinator:
Connect us: https://t.me/btsai

主要成员包括:
主要开发者: vianull, Chen188,wukoo
设计: Zheng
数据分析师: tiancaomei
产品负责人: vianull
运维负责人: wukoo
测试负责人:jie

Our team members include:
Core Developer: vianull, Chen188, wukoo
Designer: Zheng
Data Analyst: tiancaomei
Product Manager: vianull
Operation and Maintenance Manager: wukoo
Tester: jie

时间:

更新计划将从2018年11月11日开启,持续不到6个月,于2019年4月28日前结束。开发人员将在11月11日起尽快开始工作,以上列举的每一个功能都将经过测试后上线,每个1个月将会发布一次进展报告,每周通过网页公布开发进展,如果功能稳定,我们将立刻在线更新bts.ai网页。
预计开发工作将会按阶段完成,但是因为网站开发存在一定的不确定性,有可能会导致进度延迟。如果出现这种情况,我们会在开发进展中公示进度延迟的原因和重新估计后的时间。

Time:

The updated plan will start on November 11, 2018. It will last for about 6 months and end on April 28, 2019. Developers will start work as soon as possible from November 1st. Each of the functions listed above will be tested before online. Production report will be published every month. The development progress will be announced on the website every week. If the function is stable, we will update the bts.ai webpage online immediately.
It is expected that the development will be on schedule, but there may be a delay during development due to force majeure and inscrutability. If it happens, we will announce the reason on the website and then re-estimate developing period.


资金

整个项目的工作有的工作量较大,需要梳理的数据很多,不过我们对此有充足的准备。在资金预算上我们规划如下表:


收到的资金将被托管在datasolution-worker-escrow账号里,该账号由 @bitcrab  @jademont @abit 2/3多签,托管人每4周将收到的资金转换成等价值的bitCNY 并将所有多余的 BTS 返回资金池, 就像http://www.bitshares.foundation/worker/中描述的那样。为防止 BTS 下跌造成的风险,我们每天的预算为30000 BTS,多余的会返回资金池。
同时由于软件开发资金只能事前预估,本团队也将承担一部分风险。多重签名的托管人员负责评估发布软件的质量。所有的支付将公开透明。

Fund

The workload of this project is very large. There is a lot of data to be sorted out, but we are well prepared for it. We ask to be funded according to the following table.


Our received fund will be hosted in the datasolution-worker-escrow account, which will be signed by @bitcrab @jademont @abit. The escrow account will exchange the received BTS into bitCNY and return all excess BTS to the pool, as explained in http://www.bitshares.foundation/worker/ (Under Escrow Worker Model). In case BTS price below 0.3 bitcny/BTS, we ask for 30000 BTS per day. All excess BTS will return to the pool.
Because the software development can only be estimated beforehand, our team will also face some risks as well. The multi-signatured escrow takes response for evaluating the quality of the released software. All payments will be open and transparent.


请支持由"datasolution-worker-escrow"账号发出的 [Data solution:data structured, flexible API and fast-built] Worker 1.14.133,多谢!
Please note for worker 1.14.133 [Data solution:data structured, flexible API and fast-built] proposed by account "datasolution-worker-escrow" Thanks!
« Last Edit: November 02, 2018, 12:48:48 pm by vianull »

Offline bitcrab

  • Committee member
  • Hero Member
  • *
  • Posts: 977
    • View Profile
  • BitShares: bitcrab
  • GitHub: bitcrab
support!

I believe this is a powerful solution and meet the requirement of the ecosystem.

Offline vianull

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
  • BitShares: vianull
Thanks for your support !
Sorry to submit twice.  Please vote for the later one : Worker 1.14.133 submited by account datasolution-worker-escrow


Offline pc

  • Hero Member
  • *****
  • Posts: 1397
    • View Profile
    • Bitcoin - Perspektive oder Risiko?
  • BitShares: cyrano
Quote
a. ElasticSearch Plugin
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin
ES has been a good plugin. It has provided a very comprehensive unstructured storage and an indexing of historical transactions and objects on BTS. By the way, ES gives us a lot of inspiration.
But there are some shortages,

1) Cumbersome. ES requires extremely high performance for the server;
2) The synchronization time is too long, lacking of a fast-built solution. Currently, as far as we know,  the usage of ES Plugin for data query or data analysis is limited.
3) Some content (such as related information of the transaction) is not structured. Indeed, ES stores them directly, which cannot satisfy some customized queries.

What makes you think that Postgresql will be better in that regard?
* ES scales easily and was designed for cluster installations, postgres wasn't.
* Synchronization time for Postgresql will most likely be even longer than for ES.
* Operation details will be structured in ES plugin in upcoming release.
Bitcoin - Perspektive oder Risiko? ISBN 978-3-8442-6568-2 http://bitcoin.quisquis.de

Offline vianull

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
  • BitShares: vianull
Quote
a. ElasticSearch Plugin
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin
ES has been a good plugin. It has provided a very comprehensive unstructured storage and an indexing of historical transactions and objects on BTS. By the way, ES gives us a lot of inspiration.
But there are some shortages,

1) Cumbersome. ES requires extremely high performance for the server;
2) The synchronization time is too long, lacking of a fast-built solution. Currently, as far as we know,  the usage of ES Plugin for data query or data analysis is limited.
3) Some content (such as related information of the transaction) is not structured. Indeed, ES stores them directly, which cannot satisfy some customized queries.

What makes you think that Postgresql will be better in that regard?
* ES scales easily and was designed for cluster installations, postgres wasn't.
* Synchronization time for Postgresql will most likely be even longer than for ES.
* Operation details will be structured in ES plugin in upcoming release.


1. ES is easy to scale, but the amount of BTS data may not be more than billion in short time. In addition, after optimization, Postgresql still has a good performance in the billion level. In order to facilitate data backup and quickly data recover, we hope to adopt a light database solution, so we use pg to solve problems.

2. Our plan is to develop Bishares' C++ Plugin for synchronizing data to PostgreSQL and also for batch submission. It should be similar with the current ES plugin speed.
However, based on the PostgreSQL solution, we can provide a faster recovery service. Users can download the compressed data package and import it directly. It does not need to synchronize from scratch. This is why it could reduce the synchronization time.

3. In our worker proposal, in addition to structuring, we prefer to optimize and customize APIs for specific analysis requests.

For solving query problems, it's not bad to finalize APIs in different way and let them provide various data, isn't it?

Offline xeroc

  • Board Moderator
  • Hero Member
  • *****
  • Posts: 12694
  • ChainSquad GmbH
    • View Profile
    • ChainSquad GmbH
  • BitShares: xeroc
  • GitHub: xeroc
There was discussion to do a more general approach like ZeroMQ to have an interface for any middleware to receive blocks, operations and ideally even virtual operations. Then, adding a traditional database would be really easy - much easier than integrating SQL into the backend, IMHO
Give BitShares a try! Use the http://testnet.bitshares.eu provided by http://bitshares.eu powered by ChainSquad GmbH

Offline sschiessl

Sophisticated API calls for visualization would indeed be great!

Quote
a. ElasticSearch Plugin
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin
ES has been a good plugin. It has provided a very comprehensive unstructured storage and an indexing of historical transactions and objects on BTS. By the way, ES gives us a lot of inspiration.
But there are some shortages,

1) Cumbersome. ES requires extremely high performance for the server;
2) The synchronization time is too long, lacking of a fast-built solution. Currently, as far as we know,  the usage of ES Plugin for data query or data analysis is limited.
3) Some content (such as related information of the transaction) is not structured. Indeed, ES stores them directly, which cannot satisfy some customized queries.

What makes you think that Postgresql will be better in that regard?
* ES scales easily and was designed for cluster installations, postgres wasn't.
* Synchronization time for Postgresql will most likely be even longer than for ES.
* Operation details will be structured in ES plugin in upcoming release.


1. ES is easy to scale, but the amount of BTS data may not be more than billion in short time. In addition, after optimization, Postgresql still has a good performance in the billion level. In order to facilitate data backup and quickly data recover, we hope to adopt a light database solution, so we use pg to solve problems.

2. Our plan is to develop Bishares' C++ Plugin for synchronizing data to PostgreSQL and also for batch submission. It should be similar with the current ES plugin speed.
However, based on the PostgreSQL solution, we can provide a faster recovery service. Users can download the compressed data package and import it directly. It does not need to synchronize from scratch. This is why it could reduce the synchronization time.

3. In our worker proposal, in addition to structuring, we prefer to optimize and customize APIs for specific analysis requests.

For solving query problems, it's not bad to finalize APIs in different way and let them provide various data, isn't it?

1. Both PostGre and ElasticSearch are suitable for storing that amount of data, whereas the most benefit of ElasticSearch is the built-in clustering, advanced text search capabilities (Lucene) and optimization for simultaneous queries. PostGre is an old horse which offers many synergies and pure SQL queries, which many devs are used to. I'd argue that to build a plugin that offers new API you could use either or.

2. ES allows snapshoting as well https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

3. Customized APIs for specific analysis requests can also query the ES database

Quote
We have analyzed the existing technical solutions in BTS community:

a. ElasticSearch Plugin
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin
ES has been a good plugin. It has provided a very comprehensive unstructured storage and an indexing of historical transactions and objects on BTS. By the way, ES gives us a lot of inspiration.
But there are some shortages,

1) Cumbersome. ES requires extremely high performance for the server;
2) The synchronization time is too long, lacking of a fast-built solution. Currently, as far as we know,  the usage of ES Plugin for data query or data analysis is limited.
3) Some content (such as related information of the transaction) is not structured. Indeed, ES stores them directly, which cannot satisfy some customized queries.

b. python wrapper
https://github.com/oxarbitrage/bitshares-explorer-api
Python wrapper has provided a good API, its backend relies on ElasticSearch and some self-built data. However, the data is imported in a timed manner, which means that data is not real-time (it updated every day). As the data increasing, each import will take more time. have been some very mature projects in the community that  provides us with valuable experience, but there are still a lot of problems  to be solved in the current programs.

My 2 cents to the above:

a.
  • 1) What are the minimum requirements you found to run ES?
  • 2) and 3) Snapshoting and new version of ES plugin solves those issues
b.
  • The link you have provided is the open-explorer API, which uses PostGre as well in its backend. Here, data is periodically imported. This is planned to be switched to real-time with the finalization of the es_objects plugin
  • The python wrapper you mention directly queries ES, which is built for real-time data. A deployed example can be found here

In general another solution to data storage can be interesting to explore. I think a lot of synergies can be created if an abstract RESTful API is defined (for example using Swagger). It would allow to be also included in the python wrapper, which would instantly create compatibilities. What are your thoughts on that?

Offline vianull

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
  • BitShares: vianull
There was discussion to do a more general approach like ZeroMQ to have an interface for any middleware to receive blocks, operations and ideally even virtual operations. Then, adding a traditional database would be really easy - much easier than integrating SQL into the backend, IMHO


Nice comment. It was one of our initial plans and finally gave up, because of the underlying performance problem.
According to your comment, it's time to consider again and test ZeroMQ or other message queue. Performance may not be a problem with muti thread writing to db. Thank you for your suggestion.

Offline vianull

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
  • BitShares: vianull
Sophisticated API calls for visualization would indeed be great!

Quote
a. ElasticSearch Plugin
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin
ES has been a good plugin. It has provided a very comprehensive unstructured storage and an indexing of historical transactions and objects on BTS. By the way, ES gives us a lot of inspiration.
But there are some shortages,

1) Cumbersome. ES requires extremely high performance for the server;
2) The synchronization time is too long, lacking of a fast-built solution. Currently, as far as we know,  the usage of ES Plugin for data query or data analysis is limited.
3) Some content (such as related information of the transaction) is not structured. Indeed, ES stores them directly, which cannot satisfy some customized queries.

What makes you think that Postgresql will be better in that regard?
* ES scales easily and was designed for cluster installations, postgres wasn't.
* Synchronization time for Postgresql will most likely be even longer than for ES.
* Operation details will be structured in ES plugin in upcoming release.


1. ES is easy to scale, but the amount of BTS data may not be more than billion in short time. In addition, after optimization, Postgresql still has a good performance in the billion level. In order to facilitate data backup and quickly data recover, we hope to adopt a light database solution, so we use pg to solve problems.

2. Our plan is to develop Bishares' C++ Plugin for synchronizing data to PostgreSQL and also for batch submission. It should be similar with the current ES plugin speed.
However, based on the PostgreSQL solution, we can provide a faster recovery service. Users can download the compressed data package and import it directly. It does not need to synchronize from scratch. This is why it could reduce the synchronization time.

3. In our worker proposal, in addition to structuring, we prefer to optimize and customize APIs for specific analysis requests.

For solving query problems, it's not bad to finalize APIs in different way and let them provide various data, isn't it?

1. Both PostGre and ElasticSearch are suitable for storing that amount of data, whereas the most benefit of ElasticSearch is the built-in clustering, advanced text search capabilities (Lucene) and optimization for simultaneous queries. PostGre is an old horse which offers many synergies and pure SQL queries, which many devs are used to. I'd argue that to build a plugin that offers new API you could use either or.

2. ES allows snapshoting as well https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

3. Customized APIs for specific analysis requests can also query the ES database

Quote
We have analyzed the existing technical solutions in BTS community:

a. ElasticSearch Plugin
https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin
ES has been a good plugin. It has provided a very comprehensive unstructured storage and an indexing of historical transactions and objects on BTS. By the way, ES gives us a lot of inspiration.
But there are some shortages,

1) Cumbersome. ES requires extremely high performance for the server;
2) The synchronization time is too long, lacking of a fast-built solution. Currently, as far as we know,  the usage of ES Plugin for data query or data analysis is limited.
3) Some content (such as related information of the transaction) is not structured. Indeed, ES stores them directly, which cannot satisfy some customized queries.

b. python wrapper
https://github.com/oxarbitrage/bitshares-explorer-api
Python wrapper has provided a good API, its backend relies on ElasticSearch and some self-built data. However, the data is imported in a timed manner, which means that data is not real-time (it updated every day). As the data increasing, each import will take more time. have been some very mature projects in the community that  provides us with valuable experience, but there are still a lot of problems  to be solved in the current programs.

My 2 cents to the above:

a.
  • 1) What are the minimum requirements you found to run ES?
  • 2) and 3) Snapshoting and new version of ES plugin solves those issues
b.
  • The link you have provided is the open-explorer API, which uses PostGre as well in its backend. Here, data is periodically imported. This is planned to be switched to real-time with the finalization of the es_objects plugin
  • The python wrapper you mention directly queries ES, which is built for real-time data. A deployed example can be found here

In general another solution to data storage can be interesting to explore. I think a lot of synergies can be created if an abstract RESTful API is defined (for example using Swagger). It would allow to be also included in the python wrapper, which would instantly create compatibilities. What are your thoughts on that?



Thank you very much for your reply!

1) I apology that we haven't synced an ES node with full bitshares data, just built on our own private chain. I saw on the wiki (https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin) that ES recommends at least 32G of memory based on the data amount three months ago. In the past three months, the number of BTS operations exploded. In contrast, https://bts.ai uses PostgreSQL to store hundreds of millions of data, including ROR's Web Server and Cache. The memory is not as higher than 4G. ES is a great solution, but it's really too heavy for us. We are going to run a ES full node to test the minimum requirements .

2) As far as I can guess, the reason why ES occupies a lot of memory is that the full index is used by default, which is very useful for full text content search  But for Bitshares, it is very little usage.

3) I am very happy to hear that "Snapshoting and new version of ES plugin solves those issues". We are developers of bts.ai. Initially we didn't want to send a worker, but we wanted to redesign bts.ai. In the process of collecting requirements, we found that everyone was very interested in the requirements mentioned above, and then we came up with the idea of ​​development and open source. The ES plugin is very good and gives us a lot of inspiration, but the current version does not meet our requirements. Work of Bitshares core software is challenging and there are a lot of stuffs to be developed. So we hope to complete our workers in a few months and contribute to the community instead of waiting.

4) In addition, it is very important that I think that an important feature of Bitshares data is 'Time Series'. For this feature, we can do a lot of optimizations, such as using BRIN indexes. It can improve performance very well. In this respect, PostgreSQL has very good performance and features. This is also an important reason why we use PostgreSQL.

In the end, I think the "define an abstract RESTful API" you mentioned is a very good idea. It can unify the specification and greatly reduce the difficulty of Bitshares application development. We are very happy to see this happen and will try it in our work.
« Last Edit: November 05, 2018, 03:03:43 pm by vianull »

Offline sschiessl

1) I apology that we haven't synced an ES node with full bitshares data, just built on our own private chain. I saw on the wiki (https://github.com/bitshares/bitshares-core/wiki/ElasticSearch-Plugin) that ES recommends at least 32G of memory based on the data amount three months ago. In the past three months, the number of BTS operations exploded. In contrast, https://bts.ai uses PostgreSQL to store hundreds of millions of data, including ROR's Web Server and Cache. The memory is not as higher than 4G. ES is a great solution, but it's really too heavy for us. We are going to run a ES full node to test the minimum requirements .

The indices of ES are all stored on the hard drive not in RAM. The 32G memory requirement is probably just a best guess and not thought through, the RAM maintains the general performance and some caching, but not the actual chain data. Servers with 64G RAM, 500G hard drive and 1 GBit connectivity are still below 70 USD in Europe, I don't think anyone thought too much of the different pricing in other regions there.

Offline vianull

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
  • BitShares: vianull
The indices of ES are all stored on the hard drive not in RAM. The 32G memory requirement is probably just a best guess and not thought through, the RAM maintains the general performance and some caching, but not the actual chain data. Servers with 64G RAM, 500G hard drive and 1 GBit connectivity are still below 70 USD in Europe, I don't think anyone thought too much of the different pricing in other regions there.

In Alibaba Cloud, mainland China : 2 Cores ,8G RAM ,80G SSD costs 79 USD per month.  The hardware you mentioned may cost over 400 USD.

I agree that technial plan has nothing to do with price . The reason why we use PostgreSQL is mentioned above. 

Offline oxarbitrage

In general i don't agree with this worker, here are some reasons:

- Cost. It cost almost 300k usd(288000 usd to be exact to do it).
- Limited. It is database specific, a new worker may came in saying mongodb or whatever is better for their specific needs.
- Closed to participate. No one except the bts.ai team can participate in the development.
- Closed source. It don't says anything that all the work will be open source and MIT license. Will bts.ai be open source and inside the bitshares organization ?
- Reinvent the wheel. The Elasticsearch plugin is working great and have all the data needed, the synchronization time is of 20 hours according to a last report, all the data inside operations is structured and available. That cost 0 to bitshares as it is already done. Doing the same from scratch using another database by a new team is IMHO a waste of resources. The core team, with the accumulated experience can do a postgres plugin in extremely reduced time if that is what the community needs. Also, the core team can pay a team or individual to do the plugin as plugins are core work and will need review and approval from core team members.
- Benefit. Besides having some better visualizations of some data which i think is important i don't see any other real benefit of the proposal.

In my opinion i will like to see some day a general worker that will do this and others in a bounty style, most of the API links mentioned are not being developed because there is  no funding to get developers on board. There is already a ruby project for bitshares at https://github.com/MatzFan/bitshares-ruby not being improved because of lack of funding, among many other dead projects.

I think that bitshares needs a worker proposal similar to the core worker where teams and individuals can participate in the development of different tools of the bitshares ecosystem.

It honestly looks to me like reinventing the wheel after all the work it has been done in this particular regard, discard everything and start from scratch instead of build on top of previous tools to save time, resources and advance.

Just my personal opinion, i don't have any voting power or influence to decide what is accepted or not.

Offline vianull

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
  • BitShares: vianull
In general i don't agree with this worker, here are some reasons:

- Cost. It cost almost 300k usd(288000 usd to be exact to do it).
- Limited. It is database specific, a new worker may came in saying mongodb or whatever is better for their specific needs.
- Closed to participate. No one except the bts.ai team can participate in the development.
- Closed source. It don't says anything that all the work will be open source and MIT license. Will bts.ai be open source and inside the bitshares organization ?
- Reinvent the wheel. The Elasticsearch plugin is working great and have all the data needed, the synchronization time is of 20 hours according to a last report, all the data inside operations is structured and available. That cost 0 to bitshares as it is already done. Doing the same from scratch using another database by a new team is IMHO a waste of resources. The core team, with the accumulated experience can do a postgres plugin in extremely reduced time if that is what the community needs. Also, the core team can pay a team or individual to do the plugin as plugins are core work and will need review and approval from core team members.
- Benefit. Besides having some better visualizations of some data which i think is important i don't see any other real benefit of the proposal.

In my opinion i will like to see some day a general worker that will do this and others in a bounty style, most of the API links mentioned are not being developed because there is  no funding to get developers on board. There is already a ruby project for bitshares at https://github.com/MatzFan/bitshares-ruby not being improved because of lack of funding, among many other dead projects.

I think that bitshares needs a worker proposal similar to the core worker where teams and individuals can participate in the development of different tools of the bitshares ecosystem.

It honestly looks to me like reinventing the wheel after all the work it has been done in this particular regard, discard everything and start from scratch instead of build on top of previous tools to save time, resources and advance.

Just my personal opinion, i don't have any voting power or influence to decide what is accepted or not.


The important thing is NOT PostgreSQL or ES or Mongo or anything . What we care most about is how to implement the requirment .

such as:
1) How to draw charts using specified data in a certain period, e.g. the average price of a market pair, the issue/burn number of an asset, and the feed price of a smart asset.
All these stuffs are unstructured by ES currently, which take some problems to our implements.
2) Bitshares is an exchange such that there may be more professional approach to the storage and analysis the transaction data in exchange viewpoint.

As mentioned in this worker, we will open source under MIT. If this worker is validated, then all future data APIs on bts.ai will be open source, which has been part of this worker.

Offline oxarbitrage

If you want to get how much asset was issued in a period of time you can filter the elasticsearch operations by operation_type 14(ASSET ISSUE) and by  operation_history.op_object.asset_to_issue.asset_id in a period of time, please check this link:
http://148.251.96.48:5601/app/kibana#/discover?_g=()&_a=(columns:!(_source),index:'357b9f60-d5f8-11e8-bb51-9583fd938437',interval:auto,query:(language:lucene,query:'operation_type:%2014'),sort:!(block_data.block_time,desc))

If you want to go after the feed prices of a smart asset you can filter by operation type 19 and smart asset id: http://148.251.96.48:5601/app/kibana#/discover?_g=()&_a=(columns:!(_source),index:'357b9f60-d5f8-11e8-bb51-9583fd938437',interval:auto,query:(language:lucene,query:'operation_type:%2019%20AND%20%20operation_history.op_object.asset_id:%201.3.113'),sort:!(block_data.block_time,desc))

In both kibana links you can change the timeframe in the upper right.

If you want to get market price changes you can go after the fill order and so on.

Also, when operations are not enough there is the es_objects plugin that will allow to export certain(currently predefined) specific objects; however i am making some changes to try to make it work with any blockchain object so for example you could get the internal ticker objects for a pair and get the current and past market prices from there.

I strongly believe that those 2 plugins are in the same direction you want to go.

Offline Fox

- 1.1 Gather data from external APIs of CEX
  - Support publishing open source tools/process to collect data from CEX
  - Support collection of public data to be published as an open source dataset (may not meet EULA/TOS of provider(s))
  - See [Bitshares-Core/1350](https://github.com/bitshares/bitshares-core/issues/1350): NVT Data Collection and Visualization
- 1.2 Data visualization
  - Support the concept.
  - Provided already by [Kibana](https://www.elastic.co/products/kibana) today (visualization tool for Elasticsearch)
- 2 Multi-dimensional query and data export from DEX
  - Support.
  - Provided already by Elasticsearch
  - See [Bitshares-Core/1350](https://github.com/bitshares/bitshares-core/issues/1350): NVT Data Collection and Visualization
- 3 Data backup service
  - Support the concept. Have open questions about the implementation:
    - How does this differ from a seed node?
    - If not seed node How can service provider attest data authenticity (trust required)
    - Opinion: Best if block producers snapshot, attest and post to bitshares.org (caveat: loose history data)
- 4 Address identification and classification
  - Concerned this may lead to censorship
- 5 Analysis of Witnesses (feeds, nodes, etc.)
  - Support the concept

Suggestions for Other:
  - Replace focus on Postgresql Plugin in favor of ZeroMQ implementation
  - Focus on building out APIs and wrappers (using Elasticsearch) that UI Team can build from
  - DevOp tools for node operators (seed, api, block producers)

I look forward to iterating on a proposal that advances development and adoption of the BitShares platform.

Best,
Fox
Witness: fox