Info | 说明

Download the zip files and build models.
zip压缩文件可用于模型开发。

Extra notes | 备注:

  1. The first column in each dataset is the target variable, prefixed with "target_", e.g., "target_survived" or "target_Class".
    每个数据集中第1列为目标变量,以 "target_" 开头,例如,"target_survived" 或者 "target_Class"。

  2. The original class variable in some datasets may have more than 2 values, we have binarized them for binary classification modeling. Read source code for the treatments.
    某些数据集的原始分类变量可能含有2个以上的取值,我们进行了二值化处理使数据集能够进行二分类建模。源代码中有详细处理逻辑。

  3. Due to the lack of domain knowledge, data treatments may not be best-practices. Please create a new zipped data file by modifying the source code.
    由于缺少相关领域知识,某些数据处理步骤可能并非最佳实践。请修改源代码后重新生成压缩的数据文件。
Datasets | 数据集
# Dataset Description Source Code
1 keel_001_kdd_cup_1999.zip KDD Cup 1999 data set Link
2 keel_002_sonar_mines_vs_rocks.zip Sonar, Mines vs. Rocks data set Link
3 keel_003_molecular_biology.zip Molecular Biology (Splice-junction Gene Sequences) data set Link
4 keel_004_connect_4.zip Connect-4 data set Link
5 uci_001_adult_data_set.zip Predict whether income exceeds $50K/yr based on census data Link
6 uci_002_bank_marketing.zip Predict if the client will subscribe a term deposit Link
7 uci_003_human_activity_recognition.zip Human Activity Recognition Using Smartphones Link
8 uci_004_credit_approval.zip Credit Approval Data Set Link
9 uci_005_cylinder_bands.zip Used in decision tree induction for mitigating process delays Link
10 uci_006_internet_advertisements.zip Represents a set of possible advertisements on Internet pages Link
11 uci_007_ionosphere.zip Classification of radar returns from the ionosphere Link
12 uci_008_letter_recognition.zip Try to identify the letter Link
13 uci_009_multiple_features.zip This dataset consists of features of handwritten numerals (0 - 9) Link
14 uci_010_mushroom.zip Classification: poisonous or edible Link
15 uci_011_spambase.zip Classifying Email as Spam or Non-Spam Link
16 uci_012_insurance_company_benchmark.zip Insurance Company Benchmark (COIL 2000) Link
17 uci_013_german_credit_data.zip German Credit Data Link
18 uci_014_secom.zip Data from a semi-conductor manufacturing process Link
19 uci_015_qsar_biodegradation.zip Classify 1055 chemicals into 2 classes Link
20 uci_016_seismic_bumps.zip Seismic bumps forecasting in a coal mine Link
21 uci_017_thoracic_surgery_data.zip classification problem related to the lung cancer patients Link
22 uci_018_phishing_websites.zip Important features in predicting phishing websites Link
23 uci_019_default_of_credit_card_clients.zip Default of credit card clients Link
24 uci_020_sports_articles_objectivity.zip Sports articles for objectivity analysis Link
25 uci_021_heart_disease.zip Heart Disease Data Set Link
26 uci_022_dermatology.zip Determine the type of Eryhemato-Squamous Disease Link
27 uci_023_madelon.zip NIPS 2003 feature selection challenge Link
28 uci_024_ozone_level_detection.zip Ozone Level Detection Data Set Link
29 uci_025_parkinsons.zip Discriminate healthy people from those with PD Link
30 uci_026_cardiotocography.zip Cardiotocography Data Set Link
31 uci_027_miniboone_particle_identification.zip Distinguish electron neutrinos from muon neutrinos Link
32 uci_028_gas_sensor_array_drift.zip Gas Sensor Array Drift Dataset Link
33 uci_029_cnae_9.zip 1080 documents categorized into a subset of 9 categories Link
34 uci_030_climate_model_simulation_crashes.zip Predict climate model simulation crashes Link
35 uci_031_eeg_eye_state.zip 14 EEG values and a value indicating the eye state Link
36 uci_032_lsvt_voice_rehabilitation.zip Voice rehabilitation treatment 'acceptable' or 'unacceptable' Link
37 uci_033_urban_land_cover.zip Classification of urban land cover using aerial imagery Link
38 uci_034_diabetes_130_us_hospitals.zip Diabetes factors related to readmission as well as other outcomes Link
39 uci_035_gesture_phase_segmentation.zip Aim at studying Gesture Phase Segmentation Link
40 uci_036_student_performance.zip Predict student performance in secondary education Link
41 uci_037_sensorless_drive_diagnosis.zip Sensorless Drive Diagnosis Data Set Link
42 uci_038_tv_news_channel_commercial_detection.zip Automatic identification of commercial blocks in news videos Link
43 uci_039_diabetic_retinopathy_debrecen.zip Predict whether an image contains signs of diabetic retinopathy or not Link
44 uci_040_online_news_popularity.zip Predict the number of shares in social networks Link
45 uci_041_mice_protein_expression.zip Mice Protein Expression Data Set Link
46 uci_042_occupancy_detection.zip Binary classification (room occupancy) from a few features Link
47 uci_043_gas_sensors_for_home_activity.zip Gas sensors for home activity monitoring Link
48 uci_044_polish_companies_bankruptcy.zip Bankruptcy prediction of Polish companies Link
49 uci_045_htru2.zip Candidates must be classified in to pulsar and non-pulsar classes Link
50 uci_046_cervical_cancer.zip Prediction of indicators/diagnosis of cervical cancer Link
51 uci_047_epileptic_seizure_recognition.zip Epileptic seizure detection Link
52 uci_048_burst_header_packet.zip Identify the risks of the Burst Header Packet (BHP) flood attacks Link
53 uci_049_extention_of_z_alizadeh_sani.zip CAD diagnosis Link
54 uci_050_ida2016challenge.zip Heavy Scania trucks APS Failure detection Link
55 uci_051_hcc_survival.zip Predict the survival at 1 year Link
56 uci_052_online_shoppers_purchasing_intention.zip Online Shoppers Purchasing Intention Dataset Link
57 uci_053_electrical_grid_stability.zip Electrical Grid Stability Simulated Data Link
58 uci_054_caesarian_section_classification.zip Caesarian Section Classification Dataset Link
59 uci_055_audit_data.zip Build a predictor for classifying suspicious firms Link
60 uci_056_hepatitis_c_virus.zip Hepatitis C Virus (HCV) for Egyptian patients Link
61 uci_057_glass_identification.zip The study of classification of types of glass Link
62 uci_058_iris.zip The best known database in the pattern recognition literature Link
63 uci_059_optical_recognition_of_handwritten_digits.zip Optical Recognition of Handwritten Digits Link
64 vanderbilt_001_titanic.zip The survival status of individual passengers on the Titanic Link
65 vanderbilt_002_acute_bacterial_meningitis.zip Acute Bacterial Meningitis Dataset Link
66 vanderbilt_003_ari_dataset.zip Datasets analyzed in Chapter 14 of Regression Modeling Strategies Link
67 vanderbilt_004_duchenne_muscular_dystrophy.zip Chances of being a carrier based on serum markers Link
68 vanderbilt_005_right_heart_catheterization.zip Right Heart Catheterization Dataset Link
69 vanderbilt_006_ucla_stress_echocardiography.zip determine if a drug could be used effectively in a test Link
70 vanderbilt_007_support_study.zip Fit highly nonlinear predictor effects Link
71 vanderbilt_008_very_low_birth_weight_infants.zip Very Low Birth Weight Infants Dataset Link