uantNet

Quantnet is designed as a web-interface to freely exchange numerical methods, called Quantlets. The aim of Quantnet is to introduce a centralized system that is constituted by documents from different scientific areas, submitted by various authors from professional researchers to university students.

As part of the Collaborative Research Center, the Center for Applied Statistics and Economics and the International Research Training Group (IRTG) 1792, Quantnet contributes to the goal of strengthening and improving empirical economic research in Germany.

At present Quantnet contains, or has the possibility to incorporate, source code written in:

R, Matlab, SAS, Python, Stata

Books

To begin with, we have published the codes used in the following books, projects and discussion papers by the Ladislaus von Bortkiewicz Chair of Statistics:

QuantNet @ GitHub

GitHub Organization Quantlet

QuantNet is now an online GitHub based organization with diverse repositories of scientific information consisting of statistics related documents and program codes. The advantages of QuantNet are:
  • Full integration with GitHub
  • Proprietary GitHub-R-API implementation developed from the core package github available as GitHub repository "R Bindings for the Github v3 API" from Carlos Scheidegger, professor in the Department of Computer Science at the University of Arizona
  • Text Mining Pipeline providing Information Retrieval, document clustering and D3 visualizations realized via QuantMining,
    a "GitHub API based QuantNet Mining infrastructure in R"
  • Tuned and integrated search engine within the main D3 Visualization based on validated meta information in Quantlets
  • Ease of discovery and use of your technology and research results, everything in a single GitHub Markdown page
  • Standardized audit and validation of your technology by means of the Styleguide and Yamldebugger package

The QuantNet Styleguide enables a standardized audit and validation of new Quantlets by means of comprehensive help pages and the Yamldebugger package. The Style Guide contains several subsections:
  1. Style guide of Quantlets: an overview of the structure of a Quantlet
  2. Characteristics and mandatory data fields of the YAML meta info file Metainfo.txt
  3. Examples of complete and correct meta infos
  4. The main YAML rules most relevant for QuantNet
  5. Instructions on how to format the programming R code with examples of using the formatR package
  6. Basic instructions for the GitHub Desktop client
  7. Main information about the purpose of the Yamldebugger package and further guidelines (technical terms, Quantlet repository structure, special characters etc.)

YAML is a human friendly data serialization standard for all programming languages. Designed as a human-readable and data-oriented language in 2001, YAML can easily be applied to widely used data frames such as lists and arrays. What makes YAML also rather user-friendly for maintaining hierarchical data is that it avoids the excessive use of brackets, tags and other enclosures which could make the document structure less comprehensible. The design goals for YAML are, in decreasing priority:
  1. YAML is easily readable by humans
  2. YAML data is portable between programming languages
  3. YAML matches the native data structures of agile languages
  4. YAML has a consistent model to support generic tools
  5. YAML supports one-pass processing
  6. YAML is expressive and extensible
  7. YAML is easy to implement and use

Yamldebugger: In order to simplify and automate the validation process of new Quantlets, the YAML parser debugger package (or Yamldebugger for short) was developed for testing and certifying of local versions of the GitHub repositories containing YAML metadata, see the Yamldebugger repository for implementation details. The Yamldebugger fulfills two main tasks. First, it checks the Quantlet repository structure, the validity of the YAML meta information and the completeness of the mandatory data fields as described in the Style Guide. Second, the Yamldebugger helps to analyze, standardize and unify the different YAML data fields, which are subject to varying spelling and notations. The introductory Quantlets provide more examples on how to install and run the Yamldebugger with additional analysis and visualization capabilities. The following figure visualizes correlations between the most frequent keywords of the document-term matrix, which was extracted from the keywords in the Quantlet YAML meta infos:

GitHub Collaboration

Git2Q3-Collaboration: Build software better, together, now (QuantNet 2.0 @ GitHub)

Collaboration Visualization of selected repositories of this organisation:



Collaboration on the MVA-Ready repository over time, created via the GitHub-Api and GitHubVisualizer :

QuantNetXploRer

QuantNetXploRer is the Q3-D3-LSA driven and GitHub based search engine for QuantNet. Additionally, every GitHub organization which mimics the YAML Styleguide of QuantNet can be directly visualized through the QuantNetXploRer. The Q3-D3-LSA technology comprises the following main components:
  • Q3 (Quantlets, QuantNet, QuantMining): Scientific data pool and data mining infrastructure for collaborative reproducible research
  • D3 (Data-Driven Documents): Knowledge discovery via information visualization by use of the D3 JavaScript library combining powerful visualization components and a data-driven approach
  • LSA (Latent Semantic Analysis): Semantic embedding for higher clustering performance and automatic document classification by topic labeling

TM Pipeline of the “GitHub API based QuantNet Mining infrastructure in R”

Q3

The structure and objectives of the GitHub API based QuantNet Mining infrastructure in R" (Q3), SFB 649 Discussion Paper, Borke and Härdle (2017), are described by the diagram above. The text mining pipeline starts with the Parser 1 node and goes along the path till the end point at the Smart Clusterization node. Alternatively, the pipeline could start with other parsers, depending on the data source. For instance, Parsers 2 or 3 could be used for processing the special repository structure of papers or external books, respectively. In a summary, the GitHub-R-API based and rgithub driven TM pipeline (including three parser types) retrieves the YAML encoded meta information of Quantlets via the Yamldebugger package, then the LSA model is applied, clusters and labels are generated (by use of the TManalyzer layer and Validation Pipeline) and the processed data is transferred via JSON into the D3 application, which is the visualization layer of the QuantNetXploRer.

D3

D3.js (or just D3 for Data-Driven Documents) is a JavaScript library for producing dynamic, interactive data visualizations in web browsers. The QuantNetXploRer is a good example of D3 in power. More information about the D3 architecture, its various designs and the D3-based QuantNetXploRer can be found in D³ Data-Driven Documents, IEEE Transactions on Visualization and Computer Graphics 17(12), Bostock, M., Ogievetsky, V. and Heer, J. (2011) and Q3-D3-LSA, Handbook of Big data Analytics, Springer Verlag, ISBN 978-3-319-18284-1, Borke and Härdle (2017). The repository D3Genesis contains detailed information about the development of the main D3 components for the QuantNet visualization together with live examples on GitHub pages.


LSA

Latent semantic analysis (LSA) is a technique to incorporate semantic information in the measure of similarity between two documents. LSA measures semantic information through co-occurrence analysis in the text corpus. The document feature vectors are projected into the subspace spanned by the first k singular vectors of the feature space. The projection is performed by computing the singular value decomposition (SVD) of the term-document matrix. The main advantage of LSA is the dimension reduction property. Our benchmark results showed that the LSA model seems to be applicable for Big Data and has a modest time complexity. The "semantic kernel" can be interpreted as correlation of terms in the lower k-dimensional semantic space. The figure below shows a randomly chosen 30 × 30 sub matrix of the semantic kernel derived from the Quantlet YAML metadata:


Randomly chosen 30 × 30 sub matrix of the semantic kernel derived from the Quantlet YAML metadata


Recommended technical requirements

The QuantNetXploRer has been tested under Google Chrome, Microsoft Edge, Mozilla Firefox and Apple Safari with the following popular resolutions:
1920 x 1080, 1680 x 1050, 1440 x 900, 1280 x 800 and 1366 x 768.
For the best usability experience we recommend Google Chrome with the resolution 1920 x 1080 (or more). Due to Google's high-performance engine Chrome V8, this browser achieves the best D3 performance on Windows, Mac OS X and Linux systems.

Team


Responsible for the content:


Technical implementation:


Collaboration team at GitHub:


References

Ideas, papers, theory and code used in this project:


Title Published as Authors Year Link
Q3-D3-LSA Handbook of Big data Analytics, Springer Verlag, ISBN 978-3-319-18284-1 Lukas Borke, Wolfgang Karl Härdle 2017
GitHub API based QuantNet Mining infrastructure in R (Q3) SFB 649 Discussion Paper 2017-008 Lukas Borke, Wolfgang Karl Härdle 2017
Yamldebugger R Package: YAML parser debugger according to the QuantNet style guide Lukas Borke 2016
Yamldebugger introduction introductory Quantlets provide examples on how to install and run the Yamldebugger Lukas Borke 2016
Q3D3LSA - Quantlets for the corresponding paper LSA analysis and high dimensional matrix representations of the singular value decomposition results and semantic kernels Lukas Borke 2016
D3Genesis Development of the main D3 components for the QuantNet visualization Lukas Borke 2016
Clustering Validation Pipeline GitHub-API-Driven Clustering with 5-level Text Mining Validation Pipeline: R based Approach, part of the Q3 paper Lukas Borke 2016
RGoogleAnalytics Quantlets for the Section "Google Analytics" in the Q3 paper Lukas Borke 2017

Imprint


Humboldt-Universität zu Berlin
School of Business and Economics
Spandauer Str. 1
D - 10178 Berlin
Phone: +49-30-2093 5708

Responsible for the content: Prof. Dr. Wolfgang Härdle
Webmasters: Lukas Borke and Svetlana Bykovskaya

Notice of liability

Although the contents of this website and of liked third-party websites are regularly checked, Humboldt-Innovation GmbH does not assume any responsibility or liability for the contents of third-party websites. The operators of the linked third-party websites are exclusively responsible for the contents of their sites. For further information please read the disclaimer.



Terms of use
Humboldt-Innovation GmbH provides access to this website according to the following terms of use:
By using this website, you agree to the following terms of use. Humboldt-Innovation GmbH is entitled to change these terms without prior information. The continuous use of this website accounts to an acceptance of such changes.
  1. Use
    By using this website, you agree to follow the applicable laws and general etiquette. The use of this website or any of the contents made available here for any other purposes than those explicitly allowed by this terms of use is not permitted.
  2. Copyright
    This website and all the related texts, images and other material are protected by copyright and laws. The use of these contents for any other purposes than those explicitly allowed by the applicable copyright regulations is not permitted without the prior written consent of Humboldt-Innovation GmbH. You may view or print one copy of the materials or the contents of website on a single computer for private or business purposes but not for commercial use, provided that all copyright notices remain unchanged and intact. The systematic downloading of materials from this website for collections, archives, directories or databases is not permitted. Humboldt-Innovation GmbH respects the copyright of third parties and provides all the material on this website in good faith. If you believe that you have any copyrights or any other kind of protection rights with respect to the materials made available on this website, please notify Humboldt-Innovation GmbH immediately.
  3. Trademarks
    The names of other companies or products named on this website are the relevant trademarks of their respective owners where applicable. These trademarks shall not be used for products or services that have not been authorized by their respective owners.
  4. Warranty/liability
    Humboldt-Innovation GmbH neither guarantees nor accept any liability for the information made available on this website being up-to-date, complete or accurate. Humboldt-Innovation GmbH cannot be held liable for any damages resulting from the use of or the inability to use the website or its content and links. Humboldt-Innovation GmbH is not responsible for technical faults or print errors. Humboldt-Innovation GmbH can neither be held responsible for damages resulting from the use of the software used by this website or the web server (including computer viruses); nor does Humboldt-Innovation GmbH accept any responsibility for damages resulting from unauthorized access to this website, the server or the data connection. This website is regularly updated, including the terms of use, and may contain unannounced changes. Humboldt-Innovation GmbH does not guarantee the future availability of information, particularly following updates.
  5. Availability and appropriateness
    Humboldt-Innovation GmbH cannot be held responsible for the accessibility of this website in certain countries and regions or for the compliance of the information or materials provided here with the laws or customs in countries outside Germany. You access this information at your own risk and are thus personally and fully responsible for complying with the locally applicable laws.
  6. External links/disclaimer
    Links to third-party websites are provided as an additional service. We provide these links in good faith and with no guarantees of any kind, as Humboldt-Innovation GmbH has no influence on these web pages in any way. Humboldt-Innovation GmbH cannot be held responsible for the content or availability of these sites or for the links that they provide.
  7. Privacy
    Humboldt-Innovation GmbH does not collect any private information about users unless stated otherwise. Some areas of this website may require personal information by the user in order to enable the website to interact with the user. By giving this information, you are giving your agreement for Humboldt-Innovation GmbH to use the data that you provide in accordance with the regulations of the German Data Protection Act.
  8. Google Analytics privacy policy
    Source: www.datenschutzbeauftragter-info.de

    This website uses Google Analytics, a web analytics service provided by Google, Inc. (“Google”). Google Analytics uses “cookies”, which are text files placed on your computer, to help the website analyze how users use the site. The information generated by the cookie about your use of the website (including your IP address) will be transmitted to and stored by Google on servers in the United States. In case of activation of the IP anonymization, Google will truncate/anonymize the last octet of the IP address for Member States of the European Union as well as for other parties to the Agreement on the European Economic Area. Only in exceptional cases, the full IP address is sent to and shortened by Google servers in the USA. On behalf of the website provider Google will use this information for the purpose of evaluating your use of the website, compiling reports on website activity for website operators and providing other services relating to website activity and internet usage to the website provider. Google will not associate your IP address with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser. However, please note that if you do this, you may not be able to use the full functionality of this website. Furthermore you can prevent Google’s collection and use of data (cookies and IP address) by downloading and installing the browser plug-in available under https://tools.google.com/dlpage/gaoptout?hl=en-GB. You can refuse the use of Google Analytics by clicking on the following link. An opt-out cookie will be set on the computer, which prevents the future collection of your data when visiting this website: Activate Google Analytics

    Further information concerning the terms and conditions of use and data privacy can be found at http://www.google.com/analytics/terms/gb.html or at http://www.google.com/intl/en_uk/analytics/privacyoverview.html. Please note that on this website, Google Analytics code is supplemented by “gat._anonymizeIp();” to ensure an anonymized collection of IP addresses (so called IP-masking).

  9. General terms
    These terms of use replace all previous terms. There are no additional agreements relating to these terms of use.
    Should any provision of this agreement be or become invalid, ineffective or unenforceable, the remaining provisions of this agreement shall be valid. The Parties agree to replace the invalid, ineffective or unenforceable provision by a valid, effective and enforceable provision which economically best meets the intention of the Parties.
    This website is managed by Ladislaus von Bortkiewicz Chair of Statistics at Humboldt-Universität zu Berlin,
    Spandauer Str. 1, 10178 Berlin, Germany.
    The place of performance and jurisdiction for all claims and legal disputes arising from this agreement is Berlin. This clause does not apply to users acting in their capacity as consumers. This agreement is governed by the laws of the Federal Republic of Germany with the exception of private international law rules.