Make Your Own Python Package and Share with Others in AWS SageMaker

Packaging in Python is the process of creating distributable packages of code that can be easily installed and used by other developers. These packages can contain modules, functions, classes, and other components that can be shared and reused across different projects. Advantages of packaging functions Modularity: By packaging related functions together, you can organize your code into reusable modules that can be used across multiple projects. This can help to avoid duplicating code and makes it easier to maintain and update your code. Encapsulation: Packaging functions in a module can help to encapsulate the implementation details of the functions, making it easier to use them without worrying about the underlying implementation. Code reusability: If you have a set of functions that perform a specific task, packaging them into a module can make it easy to reuse those functions in other projects without having to rewrite them. Code organization: Packaging functions into a module can help to keep your code organized and easy to navigate. This can be especially helpful for larger projects with many different functions. Easy installation: By packaging functions into a module, you can make it easy for others to install and use your code by simply installing the module using pip. This can make it easier to share your code with others and collaborate on projects. Process of creating a package Here’s an example of a Python package with functions to add and subtract two numbers and how to make it available in Amazon SageMaker Create package directory First, create a new directory for your package and navigate to it in your terminal or command prompt. Then, Create a new file called init.py in your package directory. This file is required to make your directory a Python package. Then, create a new file called vp_math.py in your package directory. This file will contain the functions to add and subtract two numbers. Your directory structure should look something like this: vp_package/ __init__.py vp_math.py Adding code Now you can add the following code to your vp_math.py file: def add(x, y): return x + y def subtract(x, y): return x - y Create a setup.py file Now, create a setup.py file in your package directory, with the following code: from setuptools import setup setup( name='vp_package', version='0.0.1', description='A basic package with functions to add and subtract two numbers', packages=['vp_package'] ) Build and package Now, build and package the distribution file by running the following command in your terminal or command prompt in your package directory: python setup.py sdist This will create a .tar.gz file in a newly created dist directory. Upload the package to a shared place Upload the package to a S3 bucket that SageMaker can access. For this, you can use the AWS CLI, for example: aws s3 cp dist/vp_package-0.0.1.tar.gz s3://my-bucket/ Make sure to replace my-bucket with the name of the S3 bucket you want to use. Download and make package available in SageMaker You can import a Python package from an S3 bucket by downloading the package to your local machine or server, and then adding the downloaded package to your Python’s sys.path. Here are the steps to import a Python package from S3. # Download the package from S3 to current directory aws s3 cp s3://my-bucket/vp_package-0.0.1.tar.gz . # Extract the package to a directory named "vp_package-0.0.1" tar -xzf vp_package-0.0.1.tar.gz -C vp_package/ # We are also do the above step in a Jupyter Notebook by using ! operator !tar -xzf vp_package-0.0.1.tar.gz -C vp_package/ # Add the package to sys.path to make it importable import sys sys.path.append("/path/to/vp_package") # Now, you can import the package and use it as usual import vp_package How to use the package In SageMaker, create a new Jupyter Notebook and include the following code: !pip install vp_package==0.0.1 --target /home/ec2-user/SageMaker/vp_package from vp_package.vp_math import add, subtract print(add(2, 3)) # Output: 5 print(subtract(5, 2)) # Output: 3 The !pip install command installs your package into the SageMaker notebook instance. The –target argument tells pip where to install the package, in this case, in the /home/ec2-user/SageMaker/vp_package directory. This should import the functions from your package and allow you to use them in SageMaker. Note that in a production environment, you would want to host your package on a PyPI server or a private package repository instead of manually uploading it to S3. Comments welcome!

Coding and Maths · 2023-05-06

A Premier on Chi-squared test

The chi-square test is a statistical hypothesis test that is used to determine whether there is a significant association between two categorical variables. It is widely used in data analysis, particularly in fields such as social sciences, marketing, and biology, to examine relationships between categorical data. In this article, we will discuss the chi-square test, its applications, and how to perform it using Python. Understanding the Chi-Square Test The chi-square test is a non-parametric test that compares the observed frequencies of categorical data with the expected frequencies. The test is based on the chi-square statistic, which is calculated by summing the squared difference between the observed and expected frequencies, divided by the expected frequency, for each category. The chi-square test is used to test the null hypothesis that there is no significant association between the two variables. If the calculated chi-square value is greater than the critical value, we can reject the null hypothesis and conclude that there is a significant association between the variables. There are two types of chi-square tests: the chi-square goodness of fit test and the chi-square test of independence. The goodness of fit test is used to test whether the observed data follows a particular distribution, while the test of independence is used to test whether there is a significant association between two categorical variables. Applications of the Chi-Square Test The chi-square test is widely used in research and data analysis, with a range of applications across various fields. Some common applications include: Market research: To determine if there is a significant association between demographic factors and consumer behavior, such as age, gender, and income level. Biology: To test whether different species of plants or animals are distributed randomly or in patterns in their environment. Social sciences: To test whether there is a significant relationship between socio-economic status and educational attainment. Quality control: To test whether a sample of products is defective, based on the number of products that pass or fail inspection. Performing the Chi-Square Test in Python Python has several libraries that can be used to perform the chi-square test, including SciPy, Pandas, and StatsModels. Here is an example of how to perform the chi-square test of independence using the chi2_contingency function in the SciPy library: import scipy.stats as stats import pandas as pd # Load data into a Pandas DataFrame data = pd.read_csv('my_data.csv') # Create a contingency table contingency_table = pd.crosstab(data['variable_1'], data['variable_2']) # Perform the chi-square test of independence chi2, p, dof, expected = stats.chi2_contingency(contingency_table) # Print the results print('Chi-square statistic:', chi2) print('P-value:', p) In this example, we load data from a CSV file into a Pandas DataFrame, create a contingency table using the crosstab function, and then use the chi2_contingency function to perform the chi-square test of independence. The function returns the chi-square statistic, the p-value, the degrees of freedom, and the expected frequencies. Conclusion The chi-square test is a valuable statistical tool for examining the relationship between two categorical variables. By performing the test, we can determine whether there is a significant association between the variables and draw conclusions about the data. With the help of Python and its many data analysis libraries, we can easily perform the chi-square test and gain valuable insights from our data. Comments welcome!

Coding and Maths · 2020-12-05

A Premier on ANOVA

ANOVA (Analysis of Variance) is a statistical method used to analyze and test the differences between the means of three or more groups. ANOVA compares the variation within groups to the variation between groups to determine whether the differences in means are statistically significant or just due to random chance. The basic idea behind ANOVA is that if the variation between groups is significantly greater than the variation within groups, then there is evidence to suggest that the means of the groups are different. ANOVA allows us to test the null hypothesis that all of the group means are equal against the alternative hypothesis that at least one group mean is different from the others. ANOVA is used in a wide range of applications, including biology, social sciences, economics, and engineering. It is often used in experimental research to test the effects of different treatments or interventions on a particular outcome. There are several types of ANOVA, including one-way ANOVA, which compares the means of three or more groups that are unrelated, and repeated measures ANOVA, which compares the means of three or more groups that are related (i.e., the same group is measured under different conditions). ANOVA can be performed using software such as R, Python, or SPSS. In this article, we will be using Python. Assumptions of ANOVA ANOVA (Analysis of Variance) has several assumptions that should be met to ensure the validity and reliability of the test. The main assumptions of ANOVA are: Normality: The dependent variable should be normally distributed in each group. One way to check this is by examining the distribution of the residuals (the differences between the observed values and the predicted values) for each group. Homogeneity of variances: The variances of the dependent variable should be equal in each group. This can be checked by examining the variance of the residuals for each group. Independence: The observations should be independent of each other. This means that there should be no systematic relationship between the observations in one group and the observations in another group. Random Sampling: The observations should be randomly sampled from each group in the population. If these assumptions are not met, the results of the ANOVA may not be reliable. In addition, violating these assumptions can lead to a higher probability of type I errors (rejecting the null hypothesis when it is actually true) or type II errors (failing to reject the null hypothesis when it is actually false). Types of ANOVA tests One-way ANOVA: This test is used to compare the means of more than two independent groups. Two-way ANOVA: This test is used to compare the means of two or more independent groups while controlling for one or more other variables. One-way ANOVA One-way ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more groups. It is used to determine whether there are significant differences between the means of the groups based on the variability within each group and the variability between groups. In this article, we will walk through how to perform a one-way ANOVA test using Python. Performing a one-way ANOVA test in Python: To perform a one-way ANOVA test in Python, we can use the scipy.stats module. Here’s an example code snippet: import scipy.stats as stats import pandas as pd # Create data group1 = [1, 2, 3, 4, 5] group2 = [6, 7, 8, 9, 10] group3 = [11, 12, 13, 14, 15] # Combine data into a pandas dataframe data = pd.DataFrame({'Group1': group1, 'Group2': group2, 'Group3': group3}) # Perform one-way ANOVA test fvalue, pvalue = stats.f_oneway(data['Group1'], data['Group2'], data['Group3']) # Print results print('F-value:', fvalue) print('P-value:', pvalue) In this example, we create three groups of data (group1, group2, and group3) and combine them into a pandas dataframe. We then use the f_oneway() function from the scipy.stats module to perform the one-way ANOVA test on the three groups. The output of the test includes the F-value and the p-value. Interpreting the results: The F-value is a measure of the variance between the groups compared to the variance within the groups. A higher F-value indicates that there is more variability between the groups and less variability within the groups. The p-value is a measure of the statistical significance of the F-value. A p-value less than 0.05 indicates that there is a statistically significant difference between the means of the groups. In the example above, the F-value is 75 and the p-value is less than 0.05, which suggests that there is a statistically significant difference between the means of the three groups. Two-way ANOVA Two-way ANOVA is a statistical test used to determine the difference in the means of two or more groups. It involves testing the effects of two different factors on a response variable. In this article, we will go over how to perform two-way ANOVA in Python using the statsmodels package. To illustrate two-way ANOVA in Python, we will use a dataset called ‘PlantGrowth’. It is a dataset of 30 plants, each receiving one of three different treatments (control, trt1, and trt2) and measuring their weight after a set period. We are interested in testing the effects of the treatments and the type of seed on the weight of the plants. [{'weight': '4.17', 'group': 'ctrl', 'plant': 'plant_1'}, {'weight': '5.58', 'group': 'ctrl', 'plant': 'plant_2'}, {'weight': '5.18', 'group': 'ctrl', 'plant': 'plant_3'}, {'weight': '6.11', 'group': 'ctrl', 'plant': 'plant_4'}, {'weight': '4.50', 'group': 'ctrl', 'plant': 'plant_5'}, {'weight': '4.61', 'group': 'ctrl', 'plant': 'plant_6'}, {'weight': '5.17', 'group': 'ctrl', 'plant': 'plant_7'}, {'weight': '4.53', 'group': 'ctrl', 'plant': 'plant_8'}, {'weight': '5.33', 'group': 'ctrl', 'plant': 'plant_9'}, {'weight': '5.14', 'group': 'trt1', 'plant': 'plant_10'}, {'weight': '4.81', 'group': 'trt1', 'plant': 'plant_11'}, {'weight': '4.17', 'group': 'trt1', 'plant': 'plant_12'}, {'weight': '4.41', 'group': 'trt1', 'plant': 'plant_13'}, {'weight': '3.59', 'group': 'trt1', 'plant': 'plant_14'}, {'weight': '5.87', 'group': 'trt1', 'plant': 'plant_15'}, {'weight': '3.83', 'group': 'trt1', 'plant': 'plant_16'}, {'weight': '6.03', 'group': 'trt1', 'plant': 'plant_17'}, {'weight': '4.89', 'group': 'trt1', 'plant': 'plant_18'}, {'weight': '4.32', 'group': 'trt2', 'plant': 'plant_19'}, {'weight': '4.69', 'group': 'trt2', 'plant': 'plant_20'}, {'weight': '6.31', 'group': 'trt2', 'plant': 'plant_21'}, {'weight': '5.12', 'group': 'trt2', 'plant': 'plant_22'}, {'weight': '5.54', 'group': 'trt2', 'plant': 'plant_23'}, {'weight': '5.50', 'group': 'trt2', 'plant': 'plant_24'}, {'weight': '5.37', 'group': 'trt2', 'plant': 'plant_25'}, {'weight': '5.29', 'group': 'trt2', 'plant': 'plant_26'}, {'weight': '4.92', 'group': 'trt2', 'plant': 'plant_27'}] Here’s how to perform a two-way ANOVA in Python: Step 1: Load the required libraries and dataset import pandas as pd from statsmodels.formula.api import ols from statsmodels.stats.anova import anova_lm data = pd.read_csv('PlantGrowth.csv') Step 2: Create a model formula and fit the model model = ols('weight ~ C(treatment) + C(seed) + C(treatment):C(seed)', data).fit() Here, ‘weight’ is the dependent variable, and ‘treatment’ and ‘seed’ are the two independent variables. Step 3: Perform the two-way ANOVA using anova_lm() anova_results = anova_lm(model, typ=2) print(anova_results) The typ parameter specifies the type of sum of squares to use. Here, we use type 2 sum of squares. The anova_lm() function returns a table with the results of the ANOVA. The table includes the sum of squares, degrees of freedom, F-value, and p-value for each main effect and interaction effect. Step 4: Interpret the results The ANOVA table shows that both the main effects of ‘treatment’ and ‘seed’ are statistically significant, as well as the interaction effect between ‘treatment’ and ‘seed’. This suggests that both the type of treatment and the type of seed have a significant effect on the weight of the plants, and that the effect of the treatment depends on the type of seed. In conclusion, performing a two-way ANOVA in Python is straightforward using the statsmodels package. It is important to ensure that the assumptions of the ANOVA are met before interpreting the results. Finally, to close, ANOVA is a powerful statistical technique that can be used to compare the means of two or more groups. Whether you are testing the effectiveness of different treatments, analyzing the impact of a categorical variable, or trying to determine if there are significant differences between groups, ANOVA can help you identify these differences and draw meaningful conclusions. By using Python and its many data analysis libraries, you can easily perform ANOVA and other statistical tests on your data and gain valuable insights that can inform your decisions and actions. With the right approach and tools, ANOVA can be a valuable addition to your statistical toolbox. Comments welcome!

Coding and Maths · 2020-11-07

A Premier on T-tests

T-tests are a class of statistical tests used to determine whether there is a significant difference between the means of two groups of data. T-tests are often used to compare the means of a sample to the population mean, or to compare the means of two independent samples or two paired samples. Following are the most common types of t-tests are the one-sample t-test that we will cover: One-sample t-test: This test is used to compare the mean of a single sample to a known or hypothesized population mean. Independent samples t-test: This test is used to compare the means of two independent groups. Paired samples t-test: This test is used to compare the means of two dependent (paired) groups. T-tests have several assumptions that need to be met in order for the test to be valid. The most important assumptions are: Normality: The data should follow a normal distribution. This means that the sample means should be normally distributed. Independence: The samples should be independent of each other. This means that the observations in one sample should not be related to the observations in the other sample. Homogeneity of variances: The variances of the two samples should be approximately equal. This means that the spread of the data should be similar in both groups. If these assumptions are not met, the results of the t-test may be invalid or misleading. There are also different types of t-tests that make different assumptions. For example, the paired samples t-test assumes that the differences between paired observations are normally distributed, while the independent samples t-test assumes that the two samples have equal variances. It’s important to carefully consider the assumptions of the test and to use caution when interpreting the results. How to perform T-tests in Python One-sample t-test A one-sample t-test is used to compare the mean of a single sample to a known or hypothesized population mean. This test is useful for determining whether a sample differs significantly from the population mean. To perform a one-sample t-test in Python, you can use the scipy.stats.ttest_1samp function. Here’s an example: import numpy as np from scipy.stats import ttest_1samp # Generate a sample of data data = np.random.normal(loc=10, scale=2, size=100) # Set the hypothesized population mean pop_mean = 9 # Perform the one-sample t-test t_stat, p_val = ttest_1samp(data, pop_mean) # Print the results print("t-statistic: {:.3f}".format(t_stat)) print("p-value: {:.3f}".format(p_val)) In this example, we first generate a sample of data using the numpy.random.normal function, which generates a sample of data from a normal distribution with the specified mean (loc) and standard deviation (scale). We then set the hypothesized population mean to 9. We then perform the one-sample t-test using the ttest_1samp function, which takes two arguments: the sample data and the hypothesized population mean. The function returns two values: the t-statistic and the p-value. Finally, we print the results using the print function, formatting the t-statistic and p-value to three decimal places. If the p-value is less than the significance level (usually 0.05), we can reject the null hypothesis and conclude that the sample mean differs significantly from the population mean. Otherwise, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest a significant difference between the sample mean and the population mean. Independent samples t-test An independent samples t-test is used to compare the means of two independent groups to determine if they are significantly different. This test is used when the two groups being compared are completely independent of each other. To perform an independent samples t-test in Python, we can use the scipy.stats.ttest_ind function from the SciPy library. Here’s an example: import numpy as np from scipy.stats import ttest_ind # Generate two independent samples of data sample1 = np.random.normal(loc=10, scale=2, size=100) sample2 = np.random.normal(loc=12, scale=2, size=100) # Perform the independent samples t-test t_stat, p_val = ttest_ind(sample1, sample2) # Print the results print("t-statistic: {:.3f}".format(t_stat)) print("p-value: {:.3f}".format(p_val)) In this example, we first generate two independent samples of data using the numpy.random.normal function. We then perform the independent samples t-test using the ttest_ind function, which takes two arguments: the two samples being compared. The function returns two values: the t-statistic and the p-value. Finally, we print the results using the print function, formatting the t-statistic and p-value to three decimal places. If the p-value is less than the significance level (usually 0.05), we can reject the null hypothesis and conclude that the means of the two groups are significantly different. Otherwise, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest a significant difference between the means of the two groups. Paired samples t-test A paired samples t-test is a statistical test used to determine whether there is a statistically significant difference between the means of two related groups. In other words, it helps us determine whether the two groups are significantly different from each other or not. To perform a paired samples t-test in Python, we can use the scipy.stats module, which contains a variety of statistical functions including the ttest_rel() function. This function computes the t-test for two related samples of scores. Here is an example code snippet for performing a paired samples t-test in Python: import numpy as np from scipy.stats import ttest_rel # Create two related random samples of data before = np.random.normal(5, 1, 100) after = before + np.random.normal(1, 0.5, 100) # Compute the t-test t_stat, p_val = ttest_rel(before, after) # Print the results print("t-statistic: {}".format(t_stat)) print("p-value: {}".format(p_val)) In this example, we first create two related random samples of data using the numpy.random.normal() function. We create the second sample by adding some random noise to the first sample. We then compute the paired samples t-test for these two samples using the ttest_rel() function. The function returns two values: the t-statistic and the p-value. Finally, we print the results of the test using the print() function. If the p-value is less than the significance level (usually 0.05), we can reject the null hypothesis and conclude that the means of the two related groups are significantly different. Otherwise, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest a significant difference between the means of the two related groups. It’s important to note that a paired samples t-test assumes that the differences between the pairs of observations are normally distributed. If this assumption is not met, other tests or transformations may be needed. Additionally, like any statistical test, it’s important to carefully consider the context and limitations of the test and to avoid drawing causal conclusions from statistical associations alone. To close, T-tests are useful because they provide a simple and easy-to-interpret method for comparing two groups of data. They are widely used in a variety of fields including psychology, medicine, education, and more. However, it’s important to note that t-tests have certain assumptions, such as normality of the data and equal variances, which need to be met for the test to be valid. It’s also important to use caution when interpreting t-test results and to consider the context and limitations of the test. Comments welcome!

Coding and Maths · 2020-10-03

Statistical Hypothesis Testing

Coding and Maths · 2020-09-05

Statistical Distributions

In this article we will cover some distributions that I have found useful while analysing data. I have split them based on whether they are for a continuous or a discrete random variable. First I give a small theoretical introduction about the distribution, its probability density function, and then how to use python to represent it graphically. Continuous Distributions: Uniform distribution Normal Distribution, also known as Gaussian distribution Standard Normal Distribution - case of normal distribution where loc or mean = 0 and scale or sd = 1 Gamma distribution - exponential, chi-squared, erlang distributions are special cases of the gamma distribution Erlang distribution - special form of Gamma distribution when a is an integer ? Exponential distribution - special form of Gamma distribution with a=1 Lognormal - not covered Chi-Squared - not covered Weibull - not covered t Distribution - not covered F Distribution - not covered Discrete Distributions: Poisson distribution is a limiting case of a binomial distribution under the following conditions: n tends to infinity, p tends to zero and np is finite Binomial Distribution Negative Binomial - not covered Bernoulli Distribution is a special case of the binomial distribution where a single trial is conducted n=1 Geometric - not covered Lets import some basic libraries that we will be using: import numpy as np import pandas as pd import scipy.stats as spss import plotly.express as px import seaborn as sns Continuous Distributions Uniform distribution As the name suggests, in uniform distribution the probability of all outcomes is same. The shape of this distribution is a rectange. Now, lets plot this using python. First we will generate an array of random variables using scipy. We will specifically use scipy.stats.uniform.rvs function with following three inputs: size specifies number of random variates loc corresponds to mean scale corresponds to standard deviation rv_array = spss.uniform.rvs(size=10000, loc = 10, scale=20) Now we can plot this using the plotly library or the seaborn library. Infact seaborn has a couple of different function, namely the distplot and the histplot, both of which can be used to visually view the unoform data. Lets see the examples one by one: We can directly plot the data from the array: px.histogram(rv_array) # plotted using plotly express sns.histplot(rv_array, kde=True) # plotted using seaborn Or we can convert array into a dataframe and then plot the data frame: rv_df = pd.DataFrame(rv_array, columns=['value_of_random_variable']) px.histogram(rv_df, x='value_of_random_variable', nbins=20) # plotted using plotly express sns.histplot(data=rv_df, x='value_of_random_variable', kde=True) # plotted using seaborn Normal Distribution, also known as Gaussian distribution: The normal distribution is the most important probability distribution in statistics because it fits many natural phenomena. Normal distribution is a limiting case of Poisson distribution with the parameter lambda tends to infinity. Additionally since poisson distribution is a for of binomial distribution, normal distribution is also a form of binomial distribution. This distribution has a bell-shaped density curve described by its mean and standard deviation. The mean represents the location and the sd represents the spread of the distribution. The curve represents that the data near the mean occurrs more frequently than the data far from the mean. Lets plot it using seaborn: rv_array = spss.norm.rvs(size=10000,loc=10,scale=100) # size specifies number of random variates, loc corresponds to mean, scale corresponds to standard deviation sns.histplot(rv_array, kde=True) We can add x and y labels, change the number of bins, color of bars, etc. With distplot we can supply additional arguments for adjusting width of bars, transparency, etc. ax = sns.distplot(rv_array, bins=100, kde=True, color='cornflowerblue', hist_kws={"linewidth": 15,'alpha':1}) ax.set(xlabel='Normal Distribution', ylabel='Frequency') Standard Normal Distribution Is a special case of the normal distribution where mean = 0 and sd = 1 Lets plot it using seaborn: rv_array = spss.norm.rvs(size=10000,loc=0,scale=1) sns.histplot(rv_array, kde=True) Gamma distribution is a two-parameter family of continuous probability distributions Exponential, chi-squared, erlang distributions are special cases of the gamma distribution Lets plot it using seaborn: rv_array = spss.gamma.rvs(a=5, size=10000) # size specifies number of random variates, a is the shape parameter sns.distplot(rv_array, kde=True) Erlang distribution Special case of Gamma distribution when a is an integer. Exponential distribution Special case of Gamma distribution with a=1. Exponential distribution describes the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. Lets plot it using seaborn: rv_array = spss.expon.rvs(scale=1,loc=0,size=1000) # size specifies number of random variates, loc corresponds to mean, scale corresponds to standard deviation sns.distplot(rv_array, kde=True) Discrete Distributions Binomial Distribution Distribution where only two outcomes are possible, such as success or failure, gain or loss, win or lose. Additionally, the probability of success and failure is same for all the trials. Further, the outcomes need not be equally likely, and each trial is independent of each other. The probability of observing k events in an interval is given by the equation: f(k;n,p) = nCk * (p^k) * ((1-p)^(n-k)) Where, nCk = (n)! / ((k)! * (n-k)!) n=total number of trials p=probability of success in each trial Lets plot it using seaborn: rv_array = spss.binom.rvs(n=10,p=0.8,size=10000) # n = number of trials, p = probability of success, size = number of times to repeat the trials sns.distplot(rv_array, kde=True) Poisson Distribution Poisson random variable is typically used to model the number of times an event happened in a time interval. For example, the number of users registered for a web service in an interval can be thought of as a Poisson process. Poisson distribution is described in terms of the rate (μ) at which the events happen. The average number of events in an interval is designated λ (lambda). Lambda is the event rate, also called the rate parameter. The probability of observing k events in an interval is given by the equation: P(k events in interval) = e^(-lambda) * (lambda^k / k!) Poisson distribution is a limiting case of a binomial distribution under the following conditions: The number of trials is indefinitely large or n tends to infinity The probability of success for each trial is same and indefinitely small or p tends to zero np = lambda, is finite. Lets plot it using seaborn: rv_array = spss.poisson.rvs(mu=3, size=10000) # size specifies number of random variates, loc corresponds to mean, scale corresponds to standard deviation sns.distplot(rv_array, kde=True) Bernoulli distribution This distribution has only two possible outcomes, 1 (success) and 0 (failure), and a single trial, for example, a coin toss. The random variable X which has a Bernoulli distribution can take value 1 with the probability of success, p, and the value 0 with the probability of failure, q or 1-p. The probabilities of success and failure need not be equally likely. Probability mass function of Bernoulli distribution: f(k;p) = (p^k) * ((1-p)^(1-k)) Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (n=1) Lets plot it using seaborn: rv_array = spss.bernoulli.rvs(size=10000,p=0.6) # p = probability of success, size = number of times to repeat the trial sns.distplot(rv_array, kde=True) Hope you found this summary of distributions useful. I refer to this from time to time to jog my memory on the various distributions. Comments welcome!

Coding and Maths · 2020-08-01

Visualize data using SAS

Coding and Maths · 2020-07-04

Visualize data using Python

This is the second of a series of articles that I will write to give a gentle introduction to statistics. In this article we will cover how we can visualize data using various charts and how to read them. I will show how to create these charts using Python and will include code snippets as well. For a full version of the code visit my GitHub repository. Python has many libraries that allow creating visually appealing charts. In this article we will work with the in-built tips dataset and then plot using the following libraries: import seaborn as sns tips = sns.load_dataset("tips") # tips dataset can be loaded from seaborn sns.get_dataset_names() # to get a list of other available datasets import plotly.express as px tips = px.data.tips() # tips dataset can be loaded from plotly # data_canada = px.data.gapminder().query("country == 'Canada'") import pandas as pd tips.to_csv('/Users/vivekparashar/Downloads/tips.csv') # we can save the dataset into a csv and then load it into SAS or R for plotting import altair as alt import statsmodels.api as sm Lets take a quick look at how the tips dataset is structured: We will cover the following charts in this article: Dot plot shows changes between two (or more) points in time or between two (or more) conditions. # Using plotly library t = tips.groupby(['day','sex']).mean()[['total_bill']].reset_index() px.scatter(t, x='day', y='total_bill', color='sex', title='Average bill by gender by day', labels={'day':'Day of the week', 'total_bill':'Average Bill in $'}) Bar (horizontal and vertical) chart is used when you want to show a distribution of data points or perform a comparison of metric values across different subgroups of your data. # Using pandas plot tips.groupby('sex').mean()['total_bill'].plot(kind='bar') tips.groupby('sex').mean()['tip'].plot(kind='barh') # Using plotly t = tips.groupby(['day','sex']).mean()[['total_bill']].reset_index() px.bar(t, x='day', y='total_bill') # Using plotly px.bar(t, x='total_bill', y="day", orientation='h') Stacked Bar char is useful when you want to show more than one categorical variable per bar # using pandas plot; kind='barh' for horizontal plot # need to unstack one of the levels and fill na values tips.groupby(['day','sex']).mean()[['total_bill']]\ .unstack('sex').fillna(0)\ .plot(kind='bar', stacked=True) # Using plotly t = tips.groupby(['day','sex']).mean()[['total_bill']].reset_index() px.bar(t, x="day", y="total_bill", color="sex", title="Average bill by Gender and Day") # vertical px.bar(t, x="total_bill", y="day", color="sex", title="Average bill by Gender and Day", orientation='h') # horizontal Boxplot (horizontal and vertical) In a box plot, numerical data is divided into quartiles, and a box is drawn between the first and third quartiles, with an additional line drawn along the second quartile to mark the median. In some box plots, the minimums and maximums outside the first and third quartiles are depicted with lines, which are often called whiskers. # using pandas plot # we specify y=variable for vertical and x=variable for horizontal for horizontal box plot respectively tips[['total_bill']].plot(kind='box') # using plotly px.box(tips, y='total_bill') # using seaborn sns.boxplot(y=tips["total_bill"]) Violin plot is a variation of box plot # Using seaborn sns.violinplot(y=tips.total_bill) sns.violinplot(data=tips, x='day', y='total_bill', hue='smoker', palette='muted', split=True, scale='count', inner='quartile', order=['Thur','Fri','Sat','Sun']) sns.catplot(x='sex', y='total_bill', hue='smoker', col='time', data=tips, kind='violin', split=True, height=4, aspect=.7) Histogram is a visual representation of the frequency distribution of your data. The frequencies are represented by bars. # using pandas plot tips.total_bill.plot(kind='hist') # using plotly px.histogram(tips, x="total_bill") # using seaborn sns.histplot(data=tips, x="total_bill") # using altair alt.Chart(tips).mark_bar().encode(alt.X('total_bill:Q', bin=True),y='count()') Probability Plot is a way of visually comparing the data coming from different distributions. It can be of two types - pp plot or qq plot pp plot (Probability-to-Probability) is the way to visualize the comparing of cumulative distribution function (CDFs) of the two distributions (empirical and theoretical) against each other. qq plot (Quantile-to-Quantile) is used to compare the quantiles of two distributions. The quantiles can be defined as continuous intervals with equal probabilities or dividing the samples between a similar way The distributions may be theoretical or sample distributions from a process, etc. Normal probability plot is a case of the qq plot. It is a way of knowing whether the dataset is normally distributed or not # using statsmodels import statsmodels.graphics.gofplots as sm import numpy as np sm.ProbPlot(np.array(tips.total_bill)).ppplot(line='s') sm.ProbPlot(np.array(tips.total_bill)).qqplot(line='s') Scatter plot shows the relationship between two numerical variables. # using plotly px.scatter(tips, x='total_bill', y='tip', color='sex', size='size', hover_data=['day']) # using pandas plot tips.plot(x='total_bill', y='tip', kind='scatter') Reg plot creates a regression line between 2 parameters and helps to visualize their linear relationships # using seaborn sns.regplot(x="total_bill", y="tip", data=tips, marker='+') # for categorical variables we can add jitter to see overlapping points sns.regplot(x="size", y="total_bill", data=tips, x_jitter=.1) Line plot is used to visualize the value of something over time # using pandas plot tips['total_bill'].plot(kind='line') # using plotly px.line(tips, y='total_bill', title='Total bill') t = tips.groupby('day').sum()[['total_bill']].reset_index() px.line(t, x='day',y='total_bill', title='Total bill by day') # using altair alt.Chart(t).mark_line().encode(x='day', y='total_bill') # using seaborn sns.lineplot(data=t, x='day', y='total_bill') Area plot is like a line chart in terms of how data values are plotted on the chart and connected using line segments. In an area plot, however, the area between the line segments and the x-axis is filled with color. # using pandas plot tips.groupby('day').sum()[['total_bill']].plot(kind='area') # stacked area can be done using pandas.plot as well t = tips.groupby(['day','sex']).count()[['total_bill']].reset_index() t_pivoted = t.pivot(index='day', columns='sex', values='total_bill') t_pivoted.plot.area() # using plotly px.area(t, x='day', y='total_bill', color='sex',line_group='sex') # using altair alt.Chart(t).mark_area().encode(x='day', y='total_bill') Pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents. # using pandas plot tips.groupby('sex').count()['tip'].plot(kind='pie') # using plotly px.pie(tips, values='tip', names='day') Sunburst chart is ideal for displaying hierarchical data. Each level of the hierarchy is represented by one ring or circle with the innermost circle as the top of the hierarchy. px.sunburst(tips, path=['sex', 'day', 'time'], values='total_bill', color='day') Radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. # using plotly t = tips.groupby('day').mean()[['total_bill']].reset_index() px.line_polar(t, r='total_bill', theta='day', line_close=True) The best way to get better at visualization is through practice. What I have found useful is participating in a weekly visualization challenge called the TidyTuesday! Comments welcome!

Coding and Maths · 2020-06-06

Describe your data using Python

This is the first of a series of articles that I will write to give a gentle introduction to statistics. In this article we will introduce some basic statistical concepts and learn how to use basic statistics to help you describe your data. We will cover the following topics in this article: The difference between a population and a sample The difference between Descriptive and Inferential statistics Different types of variables Types of descriptive statistics Normal or Gaussian distribution The difference between a population and a sample: Population denotes a large group consisting of elements having at least one common feature; it is the complete set of observations Sample is a finite subset of the population; it is a subset of observations from a population. We get a sample from the population in either of the following ways Representative sampling - here the sample’s characteristics are similar to the population characteristics - A simple random sample is the most common approach to obtain a representative sample - A systematic random sample - A cluster random sample - A stratified random sample Convenience sampling - here we collect sample from section of population that is easily available The difference between Descriptive and Inferential statistics: Descriptive statistics - its all about organizing, describing and summarizing data Exploratory data analysis (EDA) measures of location - such as Mean, Median, Mode measures of variability or dispersion - such as Variance, Standard deviation, Range, Inter quartile range (IQR) Inferential statistics - its all about drawing conclusions about a population from analysis of a random sample drawn from the populaiton Exploratory modelling - how is x related to y? Predictive modelling - if you know x, can you predict y? Different types of variables: Quantitative Discrete: a variable whose value is obtained by counting. Example, number of students in a class Continuous: a variable whose value is obtained by measuring. Example, height of all students in a class Interval: this is scale of measurement where continuous data is rank ordered Ratio: this is scale of measurement where continuous data is rank ordered + has meaningful spacing Qualitative or Categorical Nominal: example gender - female or male Ordinal: example size - small, medium, or large Types of descriptive statistics: Measures of location: mainly measures of central tendency Mean: sum of all values divided by the number of values import seaborn as sns tips = sns.load_dataset('tips') tips.mean() # shows mean of all numeric variables Median: middle value in a given sequence of values ordered by rank tips.median() # shows median of all numeric variables Mode: most frequent value in a set of values tips.mode() # shows mode of all variables Measures of variability, spread or dispersion Range: Maximum value - Minimum value range = tips.total_bill.max() - tips.total_bill.min() # range IQR (Inter quartile range): 75th percentile - 25th percentile tips.total_bill.quantile(.75) - tips.total_bill.quantile(.25) # IQR Variance: Measure of variability of data around the mean tips.total_bill.var() # variance of total_bill variable Standard deviation: how spread out the data is, i.e. how much variance there is from the mean tips.total_bill.std() # standard deviation of total_bill variable Coefficient of variance (C.V.): measure of standard deviation expressed as a percentage of the mean cv = lambda x: x.std() / x.mean() * 100 cv(tips.total_bill) Measures of symmetry and peakedness: Skewness measures symmetry and Kurtosis measures peakedness Normal or Gaussian distribution This is one of the most common statistical distribution. The curve of this distribution is shaped like a bell. The shape of the bell depends on mean and standard deviation of the data Larger the standard deviation, wider the distribution A tip to quickly assess normality is to see if mean and median are nearly equal Skewness and Kurtosis Skewness measures tendency of data to be spread out on one side of the mean than the other. Skewness value indicates Negative value indicates the data is left skewed Positive value indicates the data is right skewed Closer to zero for the data to be normally distributed import scipy.stats as s s.skew(tips.total_bill, bias=False) #calculate sample skewness Kurtosis measures tendency of data to be concentrated around the center or tails. Kurtosis value indicates Platykurtic: Negative value indicates lower than normal peakedness Leptokurtic: Positive value indicates higher than normal peakedness Mesokurtic: Closer to zero for the data to be normally distributed import scipy.stats as s s.kurtosis(tips.total_bill, bias=False) #calculate sample kurtosis Comments welcome!

Coding and Maths · 2020-05-02

An Introduction to GitHub

A three part article series on version control using Git and GitHub. This is the third article in the series in which I will give a very brief introduction to GitHub. This will allow most readers to understand enough to utilize it for version control during development. What is GitHub? GitHub is a popular platform for hosting and sharing code repositories, and is widely used for version control and collaborative coding projects. If you’re new to using GitHub for version control, here are some key things to keep in mind: Create a GitHub account: The first step in using GitHub is to create an account. You can sign up for a free account, which gives you access to public repositories, or a paid account, which gives you access to private repositories and additional features. Create a new repository: Once you have an account, you can create a new repository by clicking the “New repository” button on your GitHub dashboard. You can choose to make the repository public or private, and can add a README file and other files as needed. Clone the repository to your local machine: Once you have created a repository on GitHub, you can clone it to your local machine using Git. This allows you to make changes to the code locally, and push those changes back to the remote repository on GitHub. Make changes and commit them: Once you have cloned the repository to your local machine, you can make changes to the code and commit those changes to Git. Be sure to write clear and descriptive commit messages that explain the changes made. Push changes to the remote repository: After committing changes to Git, you can push those changes back to the remote repository on GitHub. This allows other team members to see the changes and collaborate on the code. Use pull requests for code reviews: When working on a team, it’s a good practice to use pull requests to review code changes before merging them into the main branch. This allows other team members to review the code and provide feedback before changes are merged. Use branches for new features or bug fixes: When working on a new feature or bug fix, it’s important to create a new branch in Git rather than making changes directly to the main branch. This keeps the main branch stable and allows for easier collaboration with other team members. By keeping these key things in mind when using GitHub for version control, you can help ensure that your codebase is well-organized, well-documented, and easy to collaborate on with other team members. Components of GitHub Now, let us explore some of the key components of GitHub. Repository, branch Repository is a project’s folder and contains all of the project files (including documentation), and stores each file’s revision history. Branch is a parallel version of a repository. It is contained within the repository, but does not affect the primary or master branch allowing you to work freely without disrupting the “live” version. When you’ve made the changes you want to make, you can merge your branch back into the master branch to publish your changes. Commit, revert Commit, or “revision”, is an individual change to a file (or set of files). When you make a commit to save your work, Git creates a unique ID (a.k.a. the “SHA” or “hash”) that allows you to keep record of the specific changes committed along with who made them and when. Commits usually contain a commit message which is a brief description of what changes were made. Revert - when you revert a pull request on GitHub, a new pull request is automatically opened, which has one commit that reverts the merge commit from the original merged pull request. In Git, you can revert commits with git revert. Push, pull, fetch, merge Push means to send your committed changes to a remote repository on GitHub.com. For instance, if you change something locally, you can push those changes so that others may access them. Pull refers to when you are fetching in changes and merging them. For instance, if someone has edited the remote file you’re both working on, you’ll want to pull in those changes to your local copy so that it’s up to date. See also fetch. Pull requests are proposed changes to a repository submitted by a user and accepted or rejected by a repository’s collaborators. Like issues, pull requests each have their own discussion forum. Fetch - when you use git fetch, you’re adding changes from the remote repository to your local working branch without committing them. Unlike git pull, fetching allows you to review changes before committing them to your local branch. Merge takes the changes from one branch (in the same repository or from a fork), and applies them into another. This often happens as a “pull request” (which can be thought of as a request to merge), or via the command line. A merge can be done through a pull request via the GitHub.com web interface if there are no conflicting changes, or can always be done via the command line. Fork, clone, download Fork is a personal copy of another user’s repository that lives on your account. Forks allow you to freely make changes to a project without affecting the original upstream repository. You can also open a pull request in the upstream repository and keep your fork synced with the latest changes since both repositories are still connected Clone is a copy of a repository that lives on your computer instead of on a website’s server somewhere, or the act of making that copy. When you make a clone, you can edit the files in your preferred editor and use Git to keep track of your changes without having to be online. The repository you cloned is still connected to the remote version so that you can push your local changes to the remote to keep them synced when you’re online. Download option allows to download project folder as a zip file from GitHub to your local machine. This does not bring the .git folder, so using the http link to download is a better option Comments welcome!

Coding and Maths · 2019-10-05

Git Cheatsheet

A three part article series on version control using Git and GitHub. This is the second article in the series in which I will share my Git cheatsheet. This will enable the reader to quickly recall important commands to aid development. Key things to remember when using Git Always create a new branch for new features or bug fixes: When working on a new feature or bug fix, it’s important to create a new branch in Git rather than making changes directly to the main branch. This keeps the main branch stable and allows for easier collaboration with other team members. Commit early and often: It’s a good practice to commit changes to Git as often as possible, rather than waiting until the end of the day or the end of a coding session. This makes it easier to track changes and roll back to previous versions if needed. Write clear commit messages: When committing changes to Git, be sure to write clear and descriptive commit messages that explain the changes made. This makes it easier for other team members to understand the changes and can save time during code reviews. Use Git pull requests for code reviews: When working on a team, it’s a good practice to use Git pull requests to review code changes. This allows other team members to review the code and provide feedback before changes are merged into the main branch. Keep your Git repository organized: Make sure that your Git repository is organized and easy to navigate, with clear file and folder structures. This makes it easier to find specific files and makes the code repository more manageable over time. By keeping these five key things in mind when using Git for version control, you can help ensure that your codebase is well-organized, well-documented, and easy to collaborate on with other team members. Basic commands Getting & Creating Projects Command Description git init Initialize a local Git repository git clone ssh://git@github.com/[username]/[repository-name].git Create a local copy of a remote repository Basic Snapshotting Command Description git status Check status git add [file-name.txt] Add a file to the staging area git add -A Add all new and changed files to the staging area git commit -m “[commit message]” Commit changes git rm -r [file-name.txt] Remove a file (or folder) Branching & Merging Command Description git branch List branches (the asterisk denotes the current branch) git branch -a List all branches (local and remote) git branch [branch name] Create a new branch git branch -d [branch name] Delete a branch git push origin --delete [branch name] Delete a remote branch git checkout -b [branch name] Create a new branch and switch to it git checkout -b [branch name] origin/[branch name] Clone a remote branch and switch to it git branch -m [old branch name] [new branch name] Rename a local branch git checkout [branch name] Switch to a branch git checkout - Switch to the branch last checked out git checkout -- [file-name.txt] Discard changes to a file git merge [branch name] Merge a branch into the active branch git merge [source branch] [target branch] Merge a branch into a target branch git stash Stash changes in a dirty working directory git stash clear Remove all stashed entries Sharing & Updating Projects Command Description git push origin [branch name] Push a branch to your remote repository git push -u origin [branch name] Push changes to remote repository (and remember the branch) git push Push changes to remote repository (remembered branch) git push origin –delete [branch name] Delete a remote branch git pull Update local repository to the newest commit git pull origin [branch name] Pull changes from remote repository git remote add origin ssh://git@github.com/[username]/[repository-name].git Add a remote repository git remote set-url origin ssh://git@github.com/[username]/[repository-name].git Set a repository’s origin branch to SSH Inspection & Comparison Command Description git log View changes git log --summary View changes (detailed) git log --oneline View changes (briefly) git diff [source branch] [target branch] Preview changes before merging Comments welcome!

Coding and Maths · 2019-09-07

An Introduction to Git

A three part article series on version control using Git and GitHub. This is the first article in the series in which I will give a very brief introduction to Git. This will allow most readers to understand enough to utilize it for version control during development. What is Git? Git is a popular version control system that allows developers to manage and track changes to their code over time. It’s an essential tool for software development teams, as it helps to ensure that changes to code are properly tracked and documented, and makes it easier for developers to collaborate and work together. Here’s an overview of what Git is and how it works. Git is a distributed version control system, meaning that every developer working on a project has their own copy of the code repository on their local machine. This allows developers to work on their own changes and then merge them back into the main repository when they are ready. Git is also designed to be very fast and efficient, making it ideal for managing large codebases and complex projects. How does Git work? Git works by tracking changes to files and directories in a code repository. When a developer makes changes to the code, they create a new “commit” that documents the changes they made. Git stores these commits in a tree-like structure, with each commit representing a snapshot of the code at a particular point in time. This allows developers to easily view the history of changes to the code over time, and to revert to previous versions if necessary. Git also allows developers to create branches, which are essentially separate versions of the code repository that can be worked on independently. Branches are useful for trying out new features or making experimental changes without affecting the main codebase. Once changes have been tested and reviewed, they can be merged back into the main branch. Using Git for version control To use Git for version control, developers typically create a new repository on a Git hosting service such as GitHub, GitLab, or Bitbucket. They then clone the repository onto their local machine and begin making changes to the code. To commit changes, developers use Git commands such as “git add” to add changed files to the commit, and “git commit” to create a new commit with a commit message that describes the changes. To collaborate with other developers, developers can push their changes to the remote repository and create “pull requests” that allow other developers to review the changes and provide feedback. Once changes have been reviewed and approved, they can be merged back into the main branch. Basic terminal commands Terminal (for Unix or Mac) or Command Prompt for Windows allows us to type Git commands and manage project repositories. In this section we will be focusing on terminal commands. By default we are in the /home/vivek directory. home and mnt folders are in the same directory (usually they are in the highest level directory signified by just a /) pwd shows the current directory clear is used to clear the command line cd + tab key is used to cycle between sub directories in a directory cd .. is used to move up a directory cd mnt/ is used to enter the mnt directory. In this directory we can find the windows c drive (basically it is a directory named c) ~ signifies that you are in your home directory .. is used to move up one directory / signifies the highest level directory, you cant go back from there mkdir is used to create a new directory Directory names are case sensitive Right click is used to paste an absolute path name in the terminal ls is used to list all directories and files in a directory rm -rf is used to remove folders. rf tells that we are using the command to remove a directory, as by default rm is used to remove a file git --version is used to see the version of git touch file_name.txt is used to create a file Basic Git commands Git Repository is used to save project files and the information about the changes in the project. Repository can be created locally, or it can be a clone Git repository which is a copy of a remote Git repo. git init is used to initialize the directory as a git repository. This will create a .git folder in the directory and we can start using git features git status shows staging area. You will see some files under “Untracked files:” header git add file-name is used to add a file to staging area. After this you will see the file under the “Changes to be committed:” header git add . is used to add all files in directory to staging area (. signifies all) git rm --cached file.txt is used to unstage a file git rm -f file.txt is used to force remove a file from staging area and also deletes the file from directory (-f signifies force) git config --global user.email “abc.xyz@email.com” git config --global user.name “abc.xyz” git commit --help git commit -a -m “Initial commit” (-m to include a message; -a to automatically stage files that have been modified and deleted, but new files you have not told Git about are not affected) git log (if you want to see a shorter version then use git log --oneline) Head is usually on master (most recent commit). Head is what the project directory looks like. git checkout <commit-id> is used to see the contents of the folder as they looked during that particular commit git checkout master is used to restore the head to the most recent commit, hence the contents of the project directory are also restored to what they were at the time of the most recent commit git revert <commit-id> is used to revert the contents of the project directory to what they were before that particular commit. This will still appear in the log and we can go back to that commit by using git revert again git reset - three kinds - soft (only goes back in time in the commit tree, so just moves the head back; this is similar to checkout), mixed (moving back in time in the project directory but still can come back, doesn’t remove files) and hard (moving back in time in the project directory and staying there, removes files) touch .gitignore, now open the .gitignore file with notepad and add the names of the files you don’t want to track in that. # can be used to comment in this file. Usually you create .gitignore during initializing the project. If you have committed files already before adding them into the .gitignore file, then you need to remove them from cache by using the following series of commands git rm -r --cached . git add . git commit -m “message” If there is a directory in your project folder and you want to ignore all files in the directory from future commits, you can add “directory-name/*” in the .gitignore file Git Branches for Error Handling Lets say there is an error in one of the files in the project folder We can create a branch to fix the error while the master repository stays intact git checkout -b err01 (creates a new branch called err01) <fix the error in one of the files in the project folder> git add . (add all changes made to the err01 to the staging area, so they can be committed) git commit -m ‘fixed error’ (commit all changes made to err01 branch) git checkout master (switch back to master branch) git merge err01 (merge changes made in err01 to master branch; merging will only take last commit of err01 and weave it into the master branch commit timeline) git push (this will push master branch of project folder to remote repository) git push origin err01 (this will push err01 branch of project folder to remote repository) git push origin --delete err01 (we delete the err01 branch as we don’t need it anymore) git branch -d bugs (local branches can be deleted using -d) git branch -a (list all branches) Remote Repositories for Effective Collaboration First step is to create a new repository on GitHub (don’t add a read-me, gitignore or license). Copy the url of the repository Create a project folder in your local machine and browse into that folder using bash git init (you will see that the repository has not been initialized yet; git init is used to create a new repository) git remote add origin <paste url here> git remote -v (you will see that the repository has been initialized) In GitHub website “Create new file” > README.md “Create new file” > LICENSE “Create new file” > .gitignore > in content of that file type /AutoGen to exclude all files that we keep in that folder pull - go back to bash git pull origin master (we don’t need to specify origin master is we set master as the tracked branch) git branch --set-upstream-to=origin/<branch> master Sometimes you might be prompted for a login at this stage <make changes to the local repository> git push -u origin master (push updates to remote repository on GitHub; will ask for username and password) You can add other developers as collaborators to this repository. In summary, Git is a powerful tool for version control that allows developers to manage and track changes to code over time. With its distributed architecture, fast performance, and support for branching and merging, Git is an essential tool for software development teams of all sizes. Comments welcome!

Coding and Maths · 2019-08-03

Introduction to Programming in R

Quick Introduction R is a programming language and environment for statistical computing and graphics. It was created in the early 1990s by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. R is now widely used in academia, industry, and government for data analysis, statistical modeling, and data visualization. One of the key features of R is its wide range of statistical and graphical techniques. R provides a vast array of statistical and graphical methods, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and graphical techniques for data visualization. R is also highly extensible and has an active community of users and developers who create and contribute packages that enhance the capabilities of the language. R is an open-source language, which means that the code is available for free and can be modified and redistributed. This has led to the development of a large and active community of R users and developers. The R community provides a wealth of resources, including documentation, tutorials, and help forums, making it easy for users to get started with the language and to find solutions to their problems. One of the advantages of R is its integration with other programming languages and data sources. R can read data from a wide range of sources, including text files, spreadsheets, databases, and web services. R can also interact with other programming languages, such as Python, Java, and C++, allowing users to take advantage of the strengths of different languages and libraries. Another advantage of R is its versatility. R can be used for a wide range of tasks, from data analysis and visualization to machine learning and artificial intelligence. R can also be used in a variety of settings, from research and academia to industry and government. Most modern programming languages have a set up similar building blocks, for example Receiving input from the user and Showing output to the user Ability to store values in variables (usually of different kinds such as integers, floating points or character) A string of characters where you can store names, addresses, or any other kind of text Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Put your code in functions Advanced data types that are formed through a combination of one or more types of basic data types such as structures or classes Read file from a disk and save file to a disk Ability to comment your code so you can understand it when you revisit it some time later Lets dive right in and see how we can do these things in R. Before we can begin to write a program in R, we need to install R and R studio. myString <- "Hello, World!" print (myString) 1. Receiving input from the user and Showing output to the user There are several ways in which we can show output to the user. Let’s look at some ways of showing output: var.1 = c(0,1,2,3) 'Method 1: values of the variables can be printed using print() print(var.1) # Output: 0 1 2 3 'Method 2: cat() function combines multiple items into a continuous print output cat ("var.1 is ", var.1 ,"\n") # Output: var.1 is 0 1 2 3 2. Ability to store values in variables (usually of different kinds such as integers, floating points or character) Basic data types: In R we call variables as objects. There are several types of objects, lets take a look at the important ones: # Logical v <- TRUE print(class(v)) # class funciton can be used to see the data type of the variable # Numeric v <- 23.5 print(class(v)) # Integer v <- 2L print(class(v)) # Complex v <- 2+5i print(class(v)) # Character v <- "TRUE" print(class(v)) # Raw v <- charToRaw("Hello") print(class(v)) Advanced data types: Much of R’s power comes from the fact that R lets us access some advanced objects other than the basic ones shown earlier. Lets take a look at some of the advanced variables: # Vectors - When you want to create vector with more than one element, you should use c() function which means to combine the elements into a vector. # Create a vector. apple <- c('red','green',"yellow") print(apple) # Get the class of the vector. print(class(apple)) # Lists - A list is an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it. # Create a list. list1 <- list(c(2,5,3),21.3,sin) # Print the list. print(list1) # Matrices - A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function. # Create a matrix. M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE) print(M) # Arrays - While matrices are confined to two dimensions, arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimension. In the below example we create an array with two elements which are 3x3 matrices each. # Create an array. a <- array(c('green','yellow'),dim = c(3,3,2)) print(a) # Factors - Factors are the r-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling. Factors are created using the factor() function. The nlevels functions gives the count of levels. # Create a vector. apple_colors <- c('green','green','yellow','red','red','red','green') # Create a factor object. factor_apple <- factor(apple_colors) # Print the factor. print(factor_apple) print(nlevels(factor_apple)) # Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different modes of data. The first column can be numeric while the second column can be character and third column can be logical. It is a list of vectors of equal length. Data Frames are created using the data.frame() function. # Create the data frame. BMI <- data.frame( gender = c("Male", "Male","Female"), height = c(152, 171.5, 165), weight = c(81,93, 78), Age = c(42,38,26) ) print(BMI) 3. A string of characters where you can store names, addresses, or any other kind of text Any value written within a pair of single quote or double quotes in R is treated as a string. Key idea here is to learn how to manipulate string variables. There are a few common operations that we will focus on: a. Concatenate strings # Concatenate strings paste(str1, str2, str3, ... , sep = " ", collapse = NULL) b. Counting number of characters in a string # Counting number of characters in a string - nchar() function nchar(test_str) c. Changing the case - toupper() & tolower() functions str = 'apPlE' toupper(str) # APPLE tolower(str) # apple d. Extracting parts of a string - substring() function # Syntax substring(x,first,last) # Example - Extract characters from 5th to 7th position. result <- substring("Extract", 5, 7) print(result) e. Formatting - Numbers and strings can be formatted to a specific style using format() function. # Syntax format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none")) # Example # Total number of digits displayed. Last digit rounded off. result <- format(23.123456789, digits = 9) print(result) # Display numbers in scientific notation. result <- format(c(6, 13.14521), scientific = TRUE) print(result) # The minimum number of digits to the right of the decimal point. result <- format(23.47, nsmall = 5) print(result) # Format treats everything as a string. result <- format(6) print(result) # Numbers are padded with blank in the beginning for width. result <- format(13.7, width = 6) print(result) # Left justify strings. result <- format("Hello", width = 8, justify = "l") print(result) # Justfy string with center. result <- format("Hello", width = 8, justify = "c") print(result) 4. Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Arrays are a series of similar type of data stored together in one variable. Arrays can be one-dimentional or multi-dimentional. An array is created using the array() function. It takes vectors as input and uses the values in the dim parameter to create an array. For example − If we create an array of dimension (2, 3, 4) then it creates 4 rectangular matrices each with 2 rows and 3 columns. # dim=c(rows, columns, matrices) array2 = array(1:12, dim=c(2, 3, 2)) # Naming Columns and Rows column.names <- c("COL1","COL2","COL3") row.names <- c("ROW1","ROW2") matrix.names <- c("Matrix1","Matrix2") array2 = array(1:12, dim=c(2, 3, 2), dimnames = list(row.names, column.names, matrix.names)) Lets see how we can access array elements: # dim=c(rows, columns, matrices) print(array2[2,,2]) # Print the second row of the second matrix of the array. print(array2[1,3,1]) # Print the element in the 1st row and 3rd column of the 1st matrix. print(array2[,,2]) # Print the 2nd Matrix. Since the returned values here are matrices, we can perform matrix operations on them Calculations Across Array Elements (we can use user defined functions as well) apply() lapply() sapply() tapply() # apply(X, MARGIN, FUN) - apply to r or c or both - input to this funciton is a df - output is a vector, list or array m1 <- matrix(C<-(1:10),nrow=5, ncol=2) apply(m1, 2, sum) # lapply(X, FUN) - apply to all elements - input to this function is list, vector or df - output is a list # sapply(X, FUN) - apply to all elements - input to this function is list, vector or df - output is a vector or a matrix movies <- c("BRAVEHEART","BATMAN","VERTIGO","GANDHI") lapply(movies, tolower) sapply(movies, tolower) # tapply(X, INDEX, FUN = NULL) - apply to each factor variable in a vector - input to this function is a vector - output it an array data(iris) tapply(iris$Sepal.Width, iris$Species, median) 5. Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times R has several looping options (repeat, while and for). There are also options of nesting (single, double, triple, ..) loops. a. The Repeat loop executes the same code again and again until a stop condition is met: # Syntax repeat { commands if(condition) { break } } # Example v <- c("Hello","loop") cnt <- 2 repeat { print(v) cnt <- cnt+1 if(cnt > 5) { break } } b. The While loop executes the same code again and again until a stop condition is met: # Syntax while (test_expression) { statement } # Example v <- c("Hello","while loop") cnt <- 2 while (cnt < 7) { print(v) cnt = cnt + 1 } c. The for loop: # Syntax for (value in vector) { statements } # Example v <- LETTERS[1:4] for ( i in v) { print(i) } R also provides the break and next statements that allow us to alter the loops further. Following is their use: When the break statement is encountered inside a loop, the loop is immediately terminated and program control resumes at the next statement following the loop. On encountering next, the R parser skips further evaluation and starts next iteration of the loop. 6. Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails R provides if.., if..else.., if..else..if.., and switch options to apply conditional logic. Lets take a look at them: a. The basic syntax for creating an if statement in R is: # Syntax if (test_expression) { statement } # Example x <- 5 if(x > 0){ print("Positive number") } b. The basic syntax for creating an if…else statement in R is: if (test_expression) { statement1 } else { statement2 } # Example x <- -5 if(x > 0){ print("Non-negative number") } else { print("Negative number") } c. The basic syntax for creating an if…else if…else statement in R is: if (test_expression1) { statement1 } else if (test_expression2) { statement2 } else if (test_expression3) { statement3 } else { statement4 } # Example x <- 0 if (x < 0) { print("Negative number") } else if (x > 0) { print("Positive number") } else print("Zero") d. A switch statement allows a variable to be tested for equality against a list of values. Each value is called a case, and the variable being switched on is checked for each case. x <- switch( 2, "first", "second", "third", "fourth" ) print(x) 7. Put your code in functions In R a user defined function is created by using the keyword function. # Syntax function_name <- function(arg_1, arg_2, ...) { Function body } # Example # Create a function to print squares of numbers in sequence. new.function <- function(a) { for(i in 1:a) { b <- i^2 print(b) } } We can call the function new.function supplying 6 as an argument. new.function(6) We can also create functions to which we can pass arguments. These functions can also be defined to use default values for those arguments in case user does not provide a value. Lets see how this is done: new.function <- function(a = 3, b = 6) { result <- a * b print(result) } Now we can call this with or without passing any values: # Call the function without giving any argument. new.function() # Call the function with giving new values of the argument. new.function(9,5) 8. Advanced data types that are formed through a combinaiton of one or more types of basic data types such as structures or classes Class is the blueprint that helps to create an object and contains its member variable along with the attributes. R lets you create two types of classes, S3 and S4. S3 Classes: These let you overload the functions. S4 Classes: These let you limit the data as it is quite difficult to debug the program We will cover s4 classes here. S4 class is defined by the setClass() method. # Defining a class setClass("emp_info", slots=list(name="character", age="numeric", contact="character")) emp1 <- new("emp_info",name="vivek", age=30, contact="somehwere on the internet") # Access elements of a class emp1@name 9. Read file from a disk and save file to a disk Lets see how to read and write csv in an organized way. CSV is the most common file type you will be using for data science, however R can read several other file types as well. # read a csv file data <- read.csv('file.csv') # write a csv file write.csv(df, 'file.csv', row.names = FALSE) 10. Ability to comment your code so you can understand it when you revisit it some time later We can tell R that a line of code is a comment by starting it with a #. # this is a comment In summary, R is a powerful and versatile programming language that is widely used for statistical computing and graphics. Its extensive range of statistical and graphical techniques, its open-source nature, and its active community of users and developers make it a valuable tool for data analysis and modeling. Whether you are a researcher, data analyst, or developer, R provides a wide range of tools and resources for working with data and creating meaningful insights. To close I will emphasize the importance of practicing in learning anything new. Persistence and trying out different combinations of these building blocks for solving easier problems first and more complex ones later on is the only way to become fluent. Comments welcome!

Coding and Maths · 2019-07-06

Introduction to Programming in Markdown

Coding and Maths · 2019-06-01

Introduction to Programming in Python

Quick Introduction Python is a high-level, interpreted programming language that was first released in 1991 by Guido van Rossum. It is a general-purpose language that is designed to be easy to use, with a focus on readability and simplicity. Python is often used for web development, data analysis, artificial intelligence, scientific computing, and other types of software development. One of the key features of Python is its ease of use. Python’s syntax is designed to be simple and intuitive, making it accessible to both beginner and experienced programmers. Python is also an interpreted language, meaning that it does not require compilation, which makes it easy to write and test code quickly. Another important feature of Python is its support for object-oriented programming. Python allows users to create classes and objects, and to define methods on those objects. This makes it a powerful tool for building complex software systems. Python also includes a large and growing library of built-in modules and packages. These modules provide a wide range of functionality, from working with strings, arrays, and dictionaries to working with databases, web frameworks, and machine learning tools. Python’s open-source ecosystem is one of its biggest strengths, as it allows developers to easily access and integrate with a wide range of third-party libraries and tools. One of the most popular web development frameworks built in Python is Django. Django is a full-stack web framework that provides a set of conventions and tools for building web applications quickly and easily. With its focus on developer productivity, Django has become a popular choice for startups, small businesses, and large enterprises. Python’s popularity has also been driven by its use in data analysis and scientific computing. With packages like NumPy, Pandas, and Matplotlib, Python has become a leading language for data analysis and visualization. In recent years, Python has also become a popular language for artificial intelligence and machine learning, with packages like TensorFlow, PyTorch, and Scikit-learn providing powerful tools for building machine learning models. Most modern programming languages have a set up similar building blocks, for example Receiving input from the user and Showing output to the user Ability to store values in variables (usually of different kinds such as integers, floating points or character) A string of characters where you can store names, addresses, or any other kind of text Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Put your code in functions Advanced data types that are formed through a combination of one or more types of basic data types such as structures or classes Read file from a disk and save file to a disk Ability to comment your code so you can understand it when you revisit it some time later Lets dive right in and see how we can do these things in Python. 0. How to install Ruby on your desktop? Before we can begin to write a program in Python, we need to install Anaconda. This will install the Anaconda data science environment and Spyder IDE for coding in Python. Once done, go ahead and open Spyder and try out the following code to see if everything is in order. myString = "Hello, World!" print (myString) 1. Receiving input from the user and Showing output to the user There are several ways in which we can show output to the user. Let’s look at some ways of showing output: name = input('please enter your name') # receiving character input print("hello ", name, ",how are you?") # showing character output age = input('please enter your age') # receiving numeric input print('so you are', age, 'years old.') 2. Ability to store values in variables (usually of different kinds such as integers, floating points or character) Python is dynamically typed - don’t need to type out the variable’s data type before using it. This can sometimes cause unexpected problems if for example a user enters a character where you expect a number. To avoid this kind of problems type() can be used. Alternatively, you can “define the variable” by assigning it an initial value (like age=20). Basic data types: In Python we have several types of objects, lets take a look at the important ones: # Boolean / Logical v = TRUE print(type(v)) # type funciton can be used to see the data type of the variable # Numeric v = 23.5 print(type(v)) # Integer v = 2L print(type(v)) # Complex v = 2+5i print(type(v)) # Character v = "TRUE" print(type(v)) # Some common number functions: hex(1) # hexadecimal representation of numbers bin(1) # binary representation of numbers 2**3 # 2^3, 2 to the power 3 pow(2,3) # 2**3 pow(2,3,4) # 2**3 % 4 abs(-2.33) round(3.14) round(3.14159,2) # only till 2 decimal places import math sq_rt = math.sqrt(variable) # returns the square root of the variable Advanced data types: Much of Python’s power comes from the fact that it lets us access some advanced variable trypes other than the basic ones shown earlier. Lets take a look at some of the advanced variable types: # Lists - A list can contain many different types of elements inside it such as character, numeric, etc. and even another list inside it. # Create a list through enumeration. a=[] # with this we initialize a list element a=range(1,10) # with this we insert a range of values from 1-10 in the list print(list(a)) # to show the list as a list, we need to tell the print function that we are passing it a list # Output: [1, 2, 3, 4, 5, 6, 7, 8, 9] # 10 is excluded because upper bound is excluded in python # we can have mixed data types in a list b=[1,2,3,'vivek',True,4,5] print(list(b)) # index of list start with 0, 1, 2 .. # so vivek is present at index 3 print(b[3]) # slicing - [start:stop:step] a[1:6:2] # starts from 1 and goes up until 6 and selects every second element # reversing a list L[::-1] # this would take a lot more effort to do in C++! # tuples - immutable list, cant be changed t = (1,2,3) # use () instead of [] # dict - d = {'key':'value', ..} is an unordered mutable key:value pairs {"name":"frankie","age":33} # Dictionary is quite useful in matrix indexing m=np.array([[1,2,3],[4,5,6],[7,8,9]]) col_names={'age':0, 'weight':1, 'height':2} row_names={'aa':0, 'cc':1, 'bb':2} # now we can get weight of ale using actual indexes or dict indexes m[1,1] # 5 m[row_names['cc'],col_names['weight']] # 5 # set - s=set('a','b','c',..) - unordered collection of unique objects # It looks like a dictionary {"a","b"} when python shows output, but it is not because it doesn’t have key:value pairs set([1,1,2,3]) # output: {1,2,3} , List can be passed to set() set("Mississippi") # output: {'M', 'i', 'p', 's'} , Even strings can be passed to set # Matrices - A matrix is a two-dimensional rectangular data set. It can be created using .array() function. # Create a matrix import numpy as np # we need to import the numpy libabry which provides tools for numerical computing. m=np.array([[1,2,3],[4,5,6],[7,8,9]]) print(type(m)) # Arrays - while matrices are confined to two dimensions, arrays can be of any number of dimensions. # Create an array. import numpy as np # we need to import the numpy libabry which provides tools for numerical computing. a=np.array([1,2,3]) # this is a 1 dimentional array print(type(a)) # Convert a list to an array a=[1,2,3,4] a=np.array(a) # array([1, 2, 3, 4]) # DataFrame - this is an advanced object that can be used by installing the pandas library. If you are familiar with R, this is similar to data.frame. If you are familiar with excel, you can think of a dataframe as a table with rows and column where rows and colums can potentially have names/labels. You can access data within the dataframe using row/column number (indexing starts from 0) or their labels. import pandas as pd # From dict pd.DataFrame({'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}) # from list pd.DataFrame(['orange', 'mango', 'grapes', 'apple'], index=['a', 'b', 'c', 'd'], columns =['Fruits']) # from list of lists pd.DataFrame([['orange','tomato'],['mango','potato'],['grapes','onion'],['apple','chilly']], index=['a', 'b', 'c', 'd'], columns =['Fruits', 'Vegetables']) # from multiple lists pd.DataFrame( list(zip(['orange', 'mango', 'grapes', 'apple'], ['tomato', 'potato', 'onion', 'chilly'])) , index=['a', 'b', 'c', 'd'] , columns =['Fruits', 'Vegetables']) 3. A string of characters where you can store names, addresses, or any other kind of text Any value written within a pair of single quote or double quotes in Python is treated as a string. Key idea here is to learn how to manipulate string variables There are a few common operations that we will focus on: a. Concatenate strings # Concatenate strings str1 + str2 + " " + str3 b. Counting number of characters in a string # Counting number of characters in a string str1 = "vivek" len(str1) c. Changing the case - toupper() & tolower() functions str1.upper() # convert string to upper case (.lower() for lower case) str1.isupper(), str1.islower() # check if a string or a character is upper or lower d. Splitting a string s.split('e') # returns list of strings before and after e. if there are multiple e's, then split happens for all instances of e e. Palindrome of a string str1 = "vivek" str1[::-1] 4. Some advance data types such as lists which can store a series of regular variables (such as a series of integers) Lists are a series of variables stored together in one variable. Lists can be one-dimentional or multi-dimentional. A list is created using the list() function. It takes variables (even other lists) as input. List is different from string because elements can be mutated/changed. # Defining L=[0,0,0] # [0, 0, 0] L1=[0]*3 #shorthand way of defining a list with repeated elements # Supports indexing and slicing L1=['one', 'two', 'three'] L1[0] # 'one' L1[1:2] # ['two'], upper bound is excluded L1[1:3] # ['two', 'three'] # Indexing nested lists L1 = ['one', 'two', ['three', 'four'], 'five'] L1[2][0] # 'three' # Elements can be added L1.append('six') # Elements can be removed L1.pop() # last element gets popped, we can save it in a variable also # Sort L1.sort() # sorts the list in-place, the actual list gets sorted sorted(L1) #returns the sorted version of L3 list # Reverse L1=['c','a','b'] L1.reverse() # reverses the list in-place, the actual list gets reversed # Multi dimentional list indexing L1=[[1,2,3],[4,5,6],[7,8,9]] L1[0][:] # returns first row 5. Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Python has several looping options such as ‘for’ and ‘while’. There are also options of nesting (single, double, triple, ..) loops. a. The While loop executes the same code again and again until a stop condition is met: # Syntax while test: code statements else: final code statements # Example x = 0 while x < 10: print('x is currently: ',x) print(' x is still less than 10, adding 1 to x') x+=1 b. The for loop: acts as an iterator in Python; it goes through items that are in a sequence or any other iterable item. Objects that we’ve learned about that we can iterate over include strings, lists, tuples, and even built-in iterables for dictionaries, such as keys or values. # Syntax for item in object: statements to do stuff # Example list1 = [1,2,3,4,5,6,7,8,9,10] for num in list1: print(num) Python also provides the break, continue and pass statements that allow us to alter the loops further. Following is their use: break: Breaks out of the current closest enclosing loop. continue: Goes to the top of the closest enclosing loop. pass: Does nothing at all. # Thinking about break and continue statements, the general format of the while loop looks like this: while test: code statement if test: break if test: continue else: break and continue statements can appear anywhere inside the loop’s body, but we will usually put them further nested in conjunction with an if statement to perform an action based on some condition. 6. Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Python provides if.., if..else.., and if..else..if.. statements to apply conditional logic. Lets take a look at them: a. The basic syntax for creating an if statement is: if False: print('It was not true!') b. The basic syntax for creating an if…else statement is: x = False if x: print('x was True!') else: print('I will be printed in any case where x is not true') c. The basic syntax for creating an if…else if…else statement is: loc = 'Bank' if loc == 'Auto Shop': print('Welcome to the Auto Shop!') elif loc == 'Bank': print('Welcome to the bank!') else: print('Where are you?') 7. Put your code in functions Functions allows us to create a block of code that can be executed many times without needing to it write it again. # Syntax def name_of_function(argument_name='default value'): #snake casing for name, all lower case alphabets with underscores ''' what funciton does ''' print ('hello',argument_name) print (f'hello {argument_name}') #both print do the same thing # Example def add_function(a=0,b=0): return a+b We can call the function in the following two ways: # option 1 add_function(2,3) # option 2 c=add_function(3,4) *args and **kwargs stand for arguments and keyword arguments and allow us to extend the funcitonality of functions. *args lets a function take an arbitrary number of arguments. All arguments are received as a tuple, example - (a,b,c,..). args can be renamed to something else, what really matters is *. def myfunc(*args): return args ''' myfunc(1,2,3,4,5,6,7,8,9) Out[30]: (1, 2, 3, 4, 5, 6, 7, 8, 9) ''' **kwargs lets the funciton take an arbitrary number of keyword arguments. All arguments are received as a dictionary of key,value pairs. kwargs can be renamed to something else, what really matters is **. def myfunc(**kwargs): print(kwargs) ''' myfunc(name='vivek', age=34, height=186) {'name': 'vivek', 'age': 34, 'height': 186} ''' 8. Advanced data types that are formed through a combinaiton of one or more types of basic data types such as structures or classes Python allows user to create classes. These can be a combination of variables and functions that operate on those variables. Lets take a look at how we can define and use them. # Define a class class Person: "This is a person class" age = 10 def greet(self): print('Hello') # Using class print(Person.age) # Output: 10 print(Person.greet) # Output: <function Person.greet> print(Person.__doc__) # Output: 'This is my second class' # Creating an object of the class and using that vivek = Person() # create a new object of Person class print(vivek.greet) # Output: <bound method Person.greet of <__main__.Person object>> vivek.greet() # Calling object's greet() method; Output: Hello 9. Read file from a disk and save file to a disk Lets see how to read and write a csv file in an organized way. CSV is the most common file type you will be using for data science, however python can read several other file types and data directly from websites as well. import pandas # read a csv using the pandas package df = pandas.read_csv('student_data.csv') print(df) # write data to a csv using pandas package df.to_csv('student_data_copy.csv') 10. Ability to comment your code so you can understand it when you revisit it some time later We can tell Python that a line of code is a comment by starting it with a #. # this is a comment We can tell that a multi-line block of text is a comment by enclosing it in triple inverted single quotes. ''' this is a comment block ''' Overall, Python is a versatile and powerful programming language that is well-suited for a wide range of programming tasks. With its emphasis on simplicity, object-oriented design, and a large and growing ecosystem of third-party libraries and tools, Python is a valuable tool for both beginner and experienced programmers. Whether building web applications, analyzing data, or working on artificial intelligence projects, Python provides a fast, flexible, and enjoyable development experience. To close I will emphasize the importance of practicing in learning anything new. Persistence and trying out different combinations of these building blocks for solving easier problems first and more complex ones later on is the only way to become fluent. Comments welcome!

Coding and Maths · 2019-05-04

Introduction to Programming in Julia

Quick Introduction Julia is a high-level, high-performance programming language that was created in 2012 by a team of computer scientists led by Jeff Bezanson, Stefan Karpinski, and Viral Shah. Julia was designed to address the limitations of traditional scientific computing languages, such as MATLAB, Python, and R, while still retaining their ease of use and flexibility. One of the key features of Julia is its performance. Julia is designed to be fast, with execution speeds comparable to those of compiled languages such as C and Fortran. This is achieved through a combination of just-in-time (JIT) compilation, which compiles code on the fly as it is executed, and type inference, which allows Julia to determine the data types of variables at runtime. Another important feature of Julia is its support for multiple dispatch. Multiple dispatch allows Julia to select the appropriate method to use based on the types of the arguments being passed to a function. This makes Julia a flexible and expressive language that can be easily extended and customized to fit a wide range of programming tasks. Julia also includes a number of built-in data structures and libraries that make it easy to work with arrays, matrices, and other scientific computing tools. These include tools for linear algebra, statistics, optimization, and machine learning, as well as support for distributed computing and parallelism. In addition to its scientific computing features, Julia also includes support for general-purpose programming tasks, such as web development, database access, and file I/O. Julia’s growing package ecosystem provides a wide range of libraries and tools for these tasks, making it a versatile language that can be used for a variety of programming tasks. One of the key benefits of Julia is its community. Julia has a rapidly growing community of developers and users who are actively contributing to the language and its ecosystem. This community has created a large number of high-quality packages, as well as a number of online resources and forums for learning and discussing the language. Most modern programming languages have a set up similar building blocks, for example Receiving input from the user and Showing output to the user Ability to store values in variables (usually of different kinds such as integers, floating points or character) A string of characters where you can store names, addresses, or any other kind of text Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Put your code in functions Advanced data types that are formed through a combination of one or more types of basic data types such as structures or classes Read file from a disk and save file to a disk Ability to comment your code so you can understand it when you revisit it some time later Lets dive right in and see how we can do these things in Julia. 0. How to install Julia on your desktop? Before we can begin to write a program in Julia, we need to install Julia. Next you can install VSCode. Now launch VSCode and install the Julia (by julialang) extension. Now you can create a new test.jl file and add the following code and see if runs. 4+2; # If you don't want to see the result of the expression printed, use a semicolon at the end of the expression ans; # the value of the last expression you typed on the REPL, it's stored within the variable ans Before we dive in, chaining functions is possible in Julia, like so: 1:10 |> collect 1. Receiving input from the user and Showing output to the user There are several ways in which we can show output to the user. Let’s look at some ways of showing output: # receiving input from user name = readline(stdin) # showing output to user println("you name is ", name) 2. Ability to store values in variables (usually of different kinds such as integers, floating points or character) Names of variables are in lower case. Word separation can be indicated by underscores. Julia has several types of variables broadly classified into Concrete and abstract types. The types that can have subtypes (e.g. Any, Number) are called abstract types. The types that can have instances are called concrete types. These types cannot have any subtypes. Concrete types can be further divided into primitive (or basic), and complex (or composite). Let’s take a deeper look: # Primitive types ## the basic integer and float types (signed and unsigned): Int8, UInt8, Int16, UInt16, Int32, UInt32, Int64, UInt64, Int128, UInt128, Float16, Float32, and Float64 a = 10 ## more advanced numeric types: BigFloat, BigInt a = BigInt(2)^200 ## Boolean and character types: Bool and Char selected = true ## Text string types: String name = "vivek" # Composite type ## Rational, used to represent fractions. It is composed of two pieces, a numerator and a denominator, both integers (of type Int) 666//444 # To make rational numbers, use two slashes (//) Some advanced data types include dictionary and sets. Sets are similar to arrays with the difference that they dont allow element duplication. dict = Dict("a" => 1, "b" => 2, "c" => 3) dict = Dict{String,Integer}("a"=>1, "b" => 2) # If you know the types of the keys and values in advance, you can specify them after the Dict keyword, in curly braces # looking things up dict["a"] values(dict) # to retrieve all values keys(dict) # to retrieve all keys # these can be useful for iterating for k in keys(dict) for (key, value) in dict merge(d1, d2) # merge() function which can merge two dictionaries findmin(d1) # find the minimum value in a dictionary, and return the value, and its key filter((k, v) -> k == 1, d1) # sort dict - you can use the SortedDict data type from the DataStructures.jl package Pkg.add("DataStructures") import DataStructures dict = DataStructures.SortedDict("b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6) # Sets - A set is a collection of elements, just like an array or dictionary, with no duplicated elements. colors = Set{String}(["red","green","blue","yellow"]) push!(colors, "black") # You can use push!() to add elements to a set union(colors, rainbow) # The union of two sets is the set of everything that is in one or the other sets intersect(colors, rainbow) # The intersection of two sets is the set that contains every element that belongs to both sets setdiff(colors, rainbow) # The difference between two sets is the set of elements that are in the first set, but not in the second We will discuss abstract data types in section 8 below. 3. A string of characters where you can store names, addresses, or any other kind of text Any value written within a pair of double quotes in Julia is treated as a string. "this is a string" # double quotes and dollar signs need to be preceded (escaped) with a backslash """this is "a" string with double quotes""" # triple double quotes can be used to store strings with double quotes in them Julia also allows the user to indicate special strings. # special strings r" " indicates a regular expression v" " indicates a version string b" " indicates a byte literal raw" " indicates a raw string that doesn't do interpolation Key idea here is to learn how to manipulate string variables. There are a few common operations that we will focus on: a. Concatenate strings # Concatenate strings join(split(s, r"a|e|i|o|u", false), "aiou") # You can join the elements of a split string in array form using join() b. Counting number of characters in a string # Counting number of characters in a string length(str) # to find the length of a string lastindex(str) # to find index of last char of string c. Changing the case - toupper() & tolower() functions uppercase(s) d. Splitting a string split("You know my methods, Watson.") # by default splits on space split("You know my methods, Watson.", 'W') # splits on the char W # If you want to split a string into separate single-character strings, use the empty string ("") split("You know my methods, Watson.", r"a|e|i|o|u", false) # splits string on the char that matches any of the vowels # false makes sure that empty strings are not returned e. String interpolation # string interpolation - use the results of Julia expressions inside strings. x = 42 "The value of x is $(x)." # "The value of x is 42." f. Iterate over a string for char in s # iterate through a string print(char, "_") end g. Get index of all characters in a string for i in eachindex(str) @show su[i] end h. Converting between numbers and strings a = BigInt(2)^200 a=string(a) # convert number to string parse(BigInt, a) # convert strings to numbers i. Finding and replacing things inside strings s = "My dear Frodo"; in('M', s) # true occursin("Fro", s) # true findfirst("My", s) # 1:2 replace(s, "Frodo" => "Frodo Baggins") There are a lot of other functions as well: length(str) - - length of string sizeof(str) - length/size startswith(strA, strB) - does strA start with strB? endswith(strA, strB) - does strA end with strB? occursin(strA, strB) - does strA occur in strB? all(isletter, str) - is str entirely letters? all(isnumeric, str) - is str entirely number characters? isascii(str) - is str ASCII? all(iscntrl, str) - is str entirely control characters? all(isdigit, str) - is str 0-9? all(ispunct, str) - does str consist of punctuation? all(isspace, str) - is str whitespace characters? all(isuppercase, str) - is str uppercase? all(islowercase, str) - is str entirely lowercase? all(isxdigit, str) - is str entirely hexadecimal digits? uppercase(str) - return a copy of str converted to uppercase lowercase(str) - return a copy of str converted to lowercase titlecase(str) - return copy of str with the first character of each word converted to uppercase uppercasefirst(str) - return copy of str with first character converted to uppercase lowercasefirst(str) - return copy of str with first character converted to lowercase chop(str) - return a copy with the last character removed chomp(str) - return a copy with the last character removed only if it's a newline 4. Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Arrays can be one-dimentional or multi-dimentional. An array is created using the square brackets, Array constructor or several other methods. Arrays support a lot of functionality within Julia so I have covered it in more detail in this array specific article. For now lets check out the key functionality. # Defining # Creating arrays by initializing arr_Int64 = [1, 2, 3, 4, 5] # Creating empty arrays b = Int64[] # Creating 2-d arrays arr_2d = [1 2 3 4] # If you leave out the commas when defining an array, you can create 2D arrays quickly. Here's a single row, multi-column array: arr_2d = [1 2 3 4 ; 5 6 7 8] # you can add another row using ; # Creating arrays using range objects a = 1:10 # creates a range variable with 10 elements from 1 to 10 collect(a) # collect displays a range variable [a...] # instead of collect, you could use the ellipsis (...) operator (three periods) after the last element range(1, length=12, stop=100) # Julia calculates the missing pieces for you by combining the values for the keywords step(), length(), and stop() # Using comprehensions and generators to create arrays [n^2 for n in 1:5] # a 1-d array [r * c for r in 1:5, c in 1:5] # a 2-d array # Reshape an array to create a multi-dimentional array reshape([1, 2, 3, 4, 5, 6, 7, 8], 2, 4) # create a simple array and then change its shape # Supports indexing and slicing # 1-d a[5] # 5th element a[end] # last element a[end-1] # second last element # 2-d a = [[1, 2] [3,4]] a[2,2] # element at row-2 x col-2 a[:,2] # all elements of col-2 getindex(a, 2,2) # same as a[2,2] # Elements can be added a = Array[[1, 2], [3,4]] push!(a, [5,6]) # The push!() function pushes another item onto the back of an array pushfirst!(a, 0) # To add an item at the front splice() # To insert an element into an array at a given index splice!(a, 4:5, 4:6) # insert, at position 4:5, the range of numbers 4:6 L = ['a','b','f']; splice!(L, 3:2, ['c','d','e']) # insert c, d, e between b and f # Elements can be removed splice!(a,5); # If you don't supply a replacement, you can also use splice!() can remove elements and move the rest of them along pop!(a) # To remove the last item popfirst!(a) # Elementwise and vectorized operations a / 100 # every element of the new array is the original divided by 100. These operations operate elementwise n1 = 1:6; n2 = 2:7; n1 .* n2; # if two arrays are to be multiplied then we just add a . before the mathematical operator to signify elementwise # the first element of the result is what you get by multiplying the first elements of the two arrays, and so on # How function works on individual variables f(a, b) = a * b a=10;b=20;print(f(a,b)) # How function can be applied elementwise to arrays n1 = 1:6; n2 = 2:7; print(f.(n1, n2)) 5. Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Julia has several looping options such as ‘for’ and ‘while’. There are also options of nesting (single, double, triple, ..) loops. a. The While loop executes the same code again and again until a stop condition is met: # while end - iterative conditional evaluation x=0 while x < 4 println(x) global x += 1 end b. The for loop: acts as an iterator in Julia; it goes through items that are in a sequence or any other iterable item. Objects that we’ve learned about that we can iterate over include strings, lists, tuples, and even built-in iterables for dictionaries, such as keys or values. # for end - iterative evaluation # use the global keyword to define a variable that outlasts the loop for i in 1:10 z = i println("z is $z") end # Some sample for loop statements for different data types for color in ["red", "green", "blue"] # an array for letter in "julia" # a string for element in (1, 2, 4, 8, 16, 32) # a tuple for i in Dict("A"=>1, "B"=>2) # a dictionary for i in Set(["a", "e", "a", "e", "i", "o", "i", "o", "u"]) Julia also provides the break and continue statements that allow us to alter the loops further. Following is their use: break: Breaks out of the current closest enclosing loop. continue: Goes to the top of the closest enclosing loop. # Example with break statement x=0 while true println(x) x += 1 x >= 4 && break # breaks out of the loop end break and continue statements can appear anywhere inside the loop’s body, but we will usually put them further nested in conjunction with an if statement to perform an action based on some condition. Following are some other options for looping options: # list comprehensions [i^2 for i in 1:10] [(r,c) for r in 1:5, c in 1:2] # two iterators in a comprehension # Generator expressions - generator expressions can be used to produce values from iterating a variable sum(x^2 for x in 1:10) # Enumerating arrays m = rand(0:9, 3, 3) [i for i in enumerate(m)] # Zipping arrays for i in zip(0:10, 100:110, 200:210) println(i) end # Iterable objects ro = 0:2:100 [i for i in ro] 6. Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Julia provides several options to apply conditional logic. Lets take a look at them: a. ternary and compound expressions: x = 1 x > 3 ? "yes" : "no" b. Boolean switching expressions: isodd(1000003) && @warn("That's odd!") isodd(1000004) || @warn("That's odd!") c. if elseif else end - conditional evaluation: name = "Julia" if name == "Julia" println("I like Julia") elseif name == "Python" println("I like Python.") println("But I prefer Julia.") else println("I don't know what I like") end c. Error handling using try.. catch. This allows the code to still keep executing even if an error occurs, which would usually halt the program. # try catch error throw exception handling try <statement-that-might-cause-an-error>; catch e # error gets caught if it happens println("caught an error: $e") # show the error if you want to end println("but we can continue with execution...") # Example 1 - error doesnt occur try a=10 # no error catch e print(e) end # Example 2 - error occurs try la-la-la # undefined variable error catch e print(e) end 7. Put your code in functions Functions allows us to create a block of code that can be executed many times without needing to it write it again. Julia has something called a single expression function. These are usually defined in one line like so: # Single expression functions f(x) = x * x g(x, y) = sqrt(x^2 + y^2) Functions with multiple expressions are also supported and can be defined using the function keyword: # Syntax # Functions with multiple expressions function say_hello(name) println("hello ", name) end say_hello("vivek") Additionally, functions can be programmed to retun a single or multiple value using the return keyword. # define function which returns a value function add_numbers(a,b) return a+b end # call the function add_numbers(2,3) # define function which returns multiple values function add_multiply_numbers(a, b=10) # we can supply default values as well return(a+b, a*b) end # call the function add_multiply_numbers(2,3) add_multiply_numbers(2) args… lets a function take an arbitrary number of arguments. A for loop can be used to iterate over these arguments. function show_args(args...) for arg in args println(arg," ") end end show_args(10,20,25,35,50) Julia also supports anonymous functions, with no name. map((x,y,z) -> x + y + z, [1,2,3], [4, 5, 6], [7, 8, 9]) Map and reduce can also be used to apply functions to arrays. Map - If you already have a function and an array, you can call the function for each element of the array by using map() a=1:10; map(sin, a) # map() returns a new array but if you call map!() , you modify the contents of the original array The map() function collects the results of some function working on each and every element of an iterable object, such as an array of numbers. map(+, 1:10) The reduce() function does a similar job, but after every element has been seen and processed by the function, only one is left. The function should take two arguments and return one. reduce(+, 1:10) 8. Advanced data types that are formed through a combinaiton of one or more types of basic data types such as structures or classes Julia allows user to create user defined variables using abstract type (which are abstract) or mutable struct (which are concrete). Lets take a look at both. Abstract type abstract type MyAbstractType end # By default, the type you create is a direct subtype of Any abstract type MyAbstractType2 <: Number end # the new abstract type is a subtype of Number Concrete type using mutable struct # define the data type mutable struct student <: Any name age::Int end # initialize a variable of that data type x=student("vivek", 30) # use the variable x.name x.age 9. Read file from a disk and save file to a disk Lets see how to read in an organized way. f = open("sherlock-holmes.txt") # To read text from a file, first obtain a file handle: close(f) # When you've finished with the file, you should close the connection If you use the following technique then you dont need to close. The open file is automatically closed when this block finishes. open("sherlock-holmes.txt") do file # do stuff with the open file end 10. Ability to comment your code so you can understand it when you revisit it some time later We can tell Julia that a line of code is a comment by starting it with a #. # this is a comment Overall, Julia is a powerful and flexible programming language that is well-suited for scientific computing and other high-performance tasks. With its emphasis on performance, multiple dispatch, and a growing ecosystem of packages and tools, Julia is a valuable tool for researchers, data scientists, and other professionals who need a fast, flexible, and expressive language for their work. To close I will emphasize the importance of practicing in learning anything new. Persistence and trying out different combinations of these building blocks for solving easier problems first and more complex ones later on is the only way to become fluent. Comments welcome!

Coding and Maths · 2019-04-06

Introduction to Programming in Ruby

Quick Introduction Ruby is a high-level, interpreted programming language that was created in the mid-1990s by Yukihiro “Matz” Matsumoto. It is a general-purpose language that is designed to be easy to use and read, with syntax that is similar to natural language. Ruby is often used for web development, as well as for building command-line utilities, desktop applications, and other types of software. One of the key features of Ruby is its emphasis on programmer productivity and ease of use. Ruby’s syntax is designed to be intuitive and easy to read, making it accessible to both beginner and experienced programmers. Ruby also includes a number of built-in features and libraries that make it easy to accomplish common programming tasks, such as working with strings, arrays, and hashes. Another important feature of Ruby is its object-oriented programming model. Everything in Ruby is an object, and methods can be defined on objects to add functionality. Ruby also includes support for inheritance, encapsulation, and polymorphism, which makes it a powerful tool for building complex software systems. Ruby is also known for its extensive library of open-source gems, which are pre-built packages of code that can be easily integrated into Ruby projects. These gems provide a wide range of functionality, from database access to web development frameworks, and can save developers a significant amount of time and effort in building software. One of the most popular web development frameworks built in Ruby is Ruby on Rails. Rails is a full-stack web framework that provides a set of conventions and tools for building web applications quickly and easily. With its focus on developer productivity, Rails has become a popular choice for startups and small businesses, as well as for larger enterprises. Most modern programming languages have a set up similar building blocks, for example Receiving input from the user and Showing output to the user Ability to store values in variables (usually of different kinds such as integers, floating points or character) A string of characters where you can store names, addresses, or any other kind of text Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Put your code in functions Advanced data types that are formed through a combination of one or more types of basic data types such as structures or classes Read file from a disk and save file to a disk Ability to comment your code so you can understand it when you revisit it some time later Lets dive right in and see how we can do these things in Ruby. 0. How to install Ruby on your desktop? Before we can begin writing programs in Ruby, we need to set up our ruby environment. You can install Ruby from here ruby-lang.org. Additionally, you need to install an IDE to write and execute Ruby code. My personal favorite is code.visualstudio.com. Lastly, you will also need to install the following extensions within VSCode: Ruby (Peng Lv) and Code Runner (Jun Han). Now, lets write a simple program that print out hello world for the user to see print 'Hello World !!!' 1. Receiving input from the user and Showing output to the user There are several ways in which we can show output to the user. Let’s look at some ways of showing output: #Method 1: print 'Hello World !!!' #Method 2: p 'Hello World !!!' #Method 3: puts 'Hello World !!!' #Method 4: Showing data stored in variables to user my_name = "Vivek" puts "Hello #{my_name}" #Method 5: Showing multiple variables using same puts statement aString = "I'm a string!" aBoolean = true aNumber = 42 puts "string: #{aString} \nboolean: #{aBoolean} \nnumber: #{aNumber}" 2. Ability to store values in variables (usually of different kinds such as integers, floating points or character) There are three main types of variable: Strings (a collection of symbols inside speech marks) Booleans (true or false) Numbers (numeric values) Following are some examples: aString = "I'm a string!" aBoolean = true aNumber = 42 puts "string: #{aString} \nboolean: #{aBoolean} \nnumber: #{aNumber}" Performing basic math on numeric variables. There are 6 types of basic operations: addition, subtraction, multiplication, division, modulo and exponent. a = 5 b = 2 puts "sum: #{a+b}\ \ndifference: #{a-b} \nmultiplication: #{a*b} \ndivision: #{a/b} \nmodulo: #{a%b} \nexponent: #{a**b}" 3. A string of characters where you can store names, addresses, or any other kind of text You can use single quotes or double quotes for strings - either one is acceptable. myFirstString = 'I am a string!' #single quotes mySecondString = "Me too!" #double quotes There are a few common operations that we will focus on: "Hi!".length #is 3 "Hi!".reverse #is !iH "Hi!".upcase #is HI! "Hi!".downcase #is hi! # You can also use many methods at once. They are solved from left to right. "Hi!".downcase.reverse #is !ih # If you want to check if one string contains another string, you can use .include?. "Happy Birthday!".include?("Happy") 4. Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Arrays allow you to group multiple values together in a list. Each value in an array is referred to as an “element”. a. Defining an array: myArray = [] # an empty array myOtherArray = [1, 2, 3] # an array with three elements b. Accessing array elements: # In order to add to or change elements in an array, you can refer to an element by number. myOtherArray[3] = 4 Ruby has another advanced data type called Hash, which is similar to a python dictionary. Just like arrays, hashes allow you to store multiple values together. However, while arrays store values with a numerical index, hashes store information using key-value pairs. Each piece of information in the hash has a unique label, and you can use that label to access the value. a. To create a hash, use Hash.new, or myHash={}. For example: myHash=Hash.new() myHash["Key"]="value" myHash["Key2"]="value2" # or myHash={ "Key" => "value", "Key2" => "value2" } b. To access elements of a hash: puts myHash["Key"] # puts value Instead of using a string as a key, you can also use a symbol, like this: a. To create a hash, use Hash.new, or myHash={}. For example: myHash=Hash.new() myHash[:Key]="value" myHash[:Key2]="value2" # or myHash={ Key: "value", Key2: "value2", } b. To access elements of a hash: puts myHash[:Key] # puts "value" 5. Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Ruby has several looping options (For, While, and Until). There are options of nesting (single, double, triple, ..) loops as well. a. For loop executes code once for each element in expression. Following example shows how a for loop works: # Syntax for variable [, variable ...] in expression [do] code end # Example for i in 0..5 puts "Value of local variable is #{i}" end b. While loop executes code while conditional is true. A while loop’s conditional is separated from code by the reserved word do, a newline, backslash \, or a semicolon ;. Following example shows how a for loop works: # Syntax while conditional [do] code end # Example a=1 b=5 while a<=b puts "run #{a}" a=a+1 end # Ruby while modifier - Executes code while conditional is true. code while condition # or begin # If a while modifier follows a begin statement with no rescue or ensure clauses, code is executed once before conditional is evaluated. code end while conditional c. Until loop executes code while conditional is false. An until statement’s conditional is separated from code by the reserved word do, a newline, or a semicolon. Following example shows how a for loop works: # Syntax until conditional [do] code end # Example $i = 0 $num = 5 until $i > $num do puts("Inside the loop i = #$i" ) $i +=1; end # Ruby until modifier - Executes code while conditional is false. code until conditional # or begin # If an until modifier follows a begin statement with no rescue or ensure clauses, code is executed once before conditional is evaluated. code end until conditional d. Ruby also offers following keywords that can modify the behavior of the above loops: # break - Terminates the most internal loop. Terminates a method with an associated block if called within the block (with the method returning nil). # next - Jumps to the next iteration of the most internal loop. Terminates execution of a block if called within a block (with yield or call returning nil). # redo - Restarts this iteration of the most internal loop, without checking loop condition. Restarts yield or call if called within a block. # retry - If retry appears in rescue clause of begin expression, restart from the beginning of the begin body. # retry - If retry appears in the iterator, the block, or the body of the for expression, restarts the invocation of the iterator call. Arguments to the iterator is re-evaluated. 6. Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Conditionals are used to add branching logic to your programs; they allow you to include complex behaviour that only occurs under specific conditions. a. If - if condition is an expression that can be checked for truth. If the expression evaluates to true, then the code within the block is executed. if condition something to be done end # Ruby if modifier - executes code if the conditional is true. code if condition Following is an actual example of an if statement with both an elsif and an else. booleanOne = true randomCode = "Hi!" if booleanOne puts "I will be printed!" elsif randomCode.length>=1 puts "Even though the above code is true, I won't be executed because the earlier if statement was true!" else puts "I won't be printed because the if statement was executed!" end b. If Else - You can combine if with the keyword else. This lets you execute one block of code if the condition is true, and a different block if it is false. The else block will only be executed if the if block doesn’t run, so they will never both be executed. if condition something to be done else something to be done if the condition evaluates to false end c. Elseif - When you want more than two options, you can use elsif. This allows you to add more conditions to be checked. Still only one of the code blocks will be run, because the statement only executes the code in the first applicable block; Once a condition has been satisfied, the whole statement ends. Here is if/elsif/else statement syntax: if condition something to be done elsif different condition something else to be done else another different thing to be done end d. Unless - Executes code if conditional is false. If the conditional is true, code specified in the else clause is executed. unless condition # thing to be done if the condition is false else # else is optional # thing to be done if the condition is true end # Ruby unless modifier - Executes code if conditional is false. code unless conditional e. Case - this is basically same as a if-elseif-else statement, but with more clear syntax. # case statement syntax case expr0 when expr1, expr2 stmt1 when expr3, expr4 stmt2 else stmt3 end # is basically similar to the following − if expr1 === expr0 || expr2 === expr0 stmt1 elsif expr3 === expr0 || expr4 === expr0 stmt2 else stmt3 end Example of case statement $age = 5 case $age when 0 .. 2 puts "i will not be printed" when 3 .. 6 puts "i will be printed" when 7 .. 12 puts "i will not be printed" when 13 .. 18 puts "youth" else puts "i will not be printed" end 7. Put your code in functions a. In Ruby we call functions methods. Methods are reuseable sections of code that perform specific tasks in our program. Using methods means that we can write simpler, more easily readable code. # syntax def methodname # method code here end b. Methods can also be defined to accept and process any parameters that are passed to them: # Methods With Parameters def laugh(number) puts "haha " * number end c. We can call methods using the name of the method and specify the parameters within paranthesis or without them: # Using method - calling method as follows prints "haha" 5 times on the screen laugh(5) # You can also call laugh without paranthesis laugh 5 d. We can set default values for the parameters, which will be used if method is called without passing the required parameters def method_name (var1 = value1, var2 = value2) expr.. end e. We can also return values. return statement in ruby is used to return one or more values from a Ruby Method. return # or return 12 # or return 1,2,3 f. We can also define methods with variable number of parameters, like so: Variable Number of Parameters def sample (*test) puts "The number of parameters is #{test.length}" for i in 0...test.length puts "The parameters are #{test[i]}" end end sample "Zara", "6", "F" sample "Mac", "36", "M", "MCA" 8. Advanced data types that are formed through a combinaiton of one or more types of basic data types such as structures or classes Ruby allows user to create classes. These can be a combination of variables and functions that operate on those variables. Lets take a look at how we can define and use them. # Define a class class employee @@no_of_customers = 0 def initialize(id, name, addr) @cust_id = id @cust_name = name @cust_addr = addr end end # Creating an object of the class and using that cust1 = employee.new("1", "Vivek", "Somewhere on the, Internet") 9. Read file from a disk and save file to a disk Lets see how to read and parse csv in an organized way. CSV is the most common file type you will be using for data science, however ruby can read several other file types as well. require 'csv' # read a csv CSV.read("file.csv") # parse a string of text which is in csv format CSV.parse("1,penny\n2,nickel\n3,dime") 10. Ability to comment your code so you can understand it when you revisit it some time later a. We can tell ruby that a line of code is a comment by starting it with #. #this is a comment b. We can also specify a comment block, like so: =begin There are three main types of variable: 1. Strings (a collection of symbols inside speech marks) 2. Booleans (true or false) 3. Numbers (numeric values) =end Overall, Ruby is a powerful and flexible programming language that is well-suited for a wide range of programming tasks. With its focus on ease of use, object-oriented design, and extensive library of gems, Ruby is a valuable tool for both beginner and experienced programmers. Whether building web applications, desktop utilities, or other types of software, Ruby provides a fast, flexible, and enjoyable development experience. To close I will emphasize the importance of practicing in learning anything new. Persistence and trying out different combinations of these building blocks for solving easier problems first and more complex ones later on is the only way to become fluent. Comments welcome!

Coding and Maths · 2019-03-02

Introduction to Programming in C++

Quick Introduction C++ is a powerful and popular programming language that was developed in the 1980s as an extension of the C programming language. It is a high-level, object-oriented language that is used to develop a wide range of applications, including operating systems, device drivers, game engines, and more. C++ is also widely used in the field of finance and quantitative analysis, due to its speed and efficiency. One of the key features of C++ is its ability to directly manipulate memory, allowing for low-level control over the hardware. C++ is also known for its efficiency and speed, making it a popular choice for developing applications that require high performance, such as video games and real-time systems. Another key feature of C++ is its support for object-oriented programming (OOP). This allows programmers to define their own classes and objects, and to encapsulate data and functionality within those objects. OOP allows for code reusability, modularity, and flexibility, making it a popular paradigm in software development. C++ is also known for its support for templates and generic programming. Templates allow programmers to write generic code that can work with different data types, without having to write separate code for each type. This can greatly simplify code development and maintenance, and can make C++ code more efficient and easier to read. Most modern programming languages have a set up similar building blocks, for example Receiving input from the user and Showing output to the user Ability to store values in variables (usually of different kinds such as integers, floating points or character) A string of characters where you can store names, addresses, or any other kind of text Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Put your code in functions Advanced data types that are formed through a combination of one or more types of basic data types such as structures or classes Read file from a disk and save file to a disk Ability to comment your code so you can understand it when you revisit it some time later Lets dive right in and see how we can do these things in C++. 0. How to install C++ on your desktop? Before we can begin to write a program in C++, we need to install Dev-C++. Once done, go ahead and open the IDE and try out the following code to see if everything is in order. #include <iostream> using namespace std; int main() { cout << "Hello World!"; return 0; } As you noticed, unlike languages such as Python, R or Ruby, it takes more than a few statements just to display basic text to the user in C++. In the next section we will try to dismantle this code and understand the various components. Lets however cover a few important points: In C++ we need to end each line of code with a semi-colon ; The scope of statements is defined using curly brackets {}, unlike Python where the scope is defined through indentation All statements need to be within a function. Here we have included the statements in the main() function which is the first function that is executed during a compiler call. All other functions will be called from within this funciton. 1. Receiving input from the user and Showing output to the user Following program shows output to the user. The include statement is used to call the iostream header file which is same as a python library. This header file provides information on basic programming routines including input and output constructs. The next is int main() which says that the main function will return an integer after execution. Within the main function we use cout« to show the text to the user. The text is enclosed in double quotes “text”. endl after the text tells the compiler to insert a new line in the output window. Finally we return 0 as the main function is supposed to return an integer. 0 signifies that everything was in order during the execution of the function. #include <iostream> using namespace std; int main() { cout << "This is some text." << endl; return 0; } We can modify this program to accept input form the user. The cin» statement allows us to receive input. The variable in which we store the received input needs to be defined beforehand. #include <iostream> using namespace std; int main() { int age_ = 0; cout << "What is your age?"; cin>>age_; cout << "So your age is: " << age_; return 0; } 2. Ability to store values in variables (usually of different kinds such as integers, floating points or character) C++ is not dynamically typed - you need to type out the variable’s name and data type before using it. Basic data types: In C++ we have several types of variables, lets take a look at the important ones: // Integer int numberCats=5; long int numberCats=5; //long int can be used for storing large values // Floating point numbers. These are numbers with significant digits after the decimal float pi=3.1415926535; //pi=22/7 // Double double dValue=3.1415926535; //for more significant digits we need to use other variable type than float long double ldValue=3.1415926535; // Boolean bool bval=true; //boolean type is true or false; c++ uses 1 for true and 0 for false when outputting // Character char cval=55, cval2='7'; //takes exactly 1 byte of computer memory, char represents single characters from the ascii character set, 55 is the ascii code for 7, this is not the number 7 but the character 7 // String string myname; 3. A string of characters where you can store names, addresses, or any other kind of text A string in C++ can be defined using the string keyword. It can be assigned usign the input from user or it can be assigned by providing text within double quotes “text”. string yourName; cout << "\n\nwhat is your name? "; cin >> yourName; cout <<"\nnice to meet you "<<yourName<<endl<<endl; 4. Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Arrays are a series of variables stored together in one variable. Arrays can be one-dimentional or multi-dimentional. One-dimentional arrays: // Defining int ar[3]; // Initializing the array ar[0]=10; ar[1]=20; ar[2]=30; // Supports indexing cout<<ar[0]; // this will output the value stored at index 0, which is 10 Multi-dimentional arrays: // Defining int mar[3][2] //multi-dim array // Initializing the array mar[3][2]={ {34,188}, {29,165}, {29,160} }; // Supports indexing cout<<ar[0][0]; // this will output the value stored at row index 0 x column index 0, which is 34 Loops can be used to iterate over one-dimentional or multi-dimentional arrays. We will take a closer look at this in the next section. 5. Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times C++ has several looping options such as ‘for’, ‘while’ and ‘do while’. There are also options of nesting (single, double, triple, ..) loops. a. The for loop // Syntax for (i=0;i<10;i++){ statements to do stuff } // iterate over elements of one-dimentional array // practice - create an array with a table of 12 int t12[10]; for (int i=0;i<10;i++){ t12[i]=12*(i+1); } // iterate over elements of two-dimentional array (concept of nesting - we will enclose a for loop within another for loop) int mar[3][2]={ {34,188}, {29,165}, {29,160} }; //multi-dim array cout<<"\nthis is a multi dimentional array: "; for (int i=0;i<3;i++){ //3 rows in the array cout<<"\nrow "<<i+1<<": "; for (int j=0;j<2;j++){ //2 columns in the array cout<<"col "<<j+1<<": "<<mar[i][j]<<", "; } } b. The While loop executes the same code again and again until a stop condition is met: // Syntax int i=0; while (i<10){ code statements; i+=1; } // Example int i=1; cout<<"\n\nwhile loop - first 10 natural numbers"<<endl; while (i<=10){ cout<<i<<", "; i+=1; //same as i=i+1 or i+=1 } c. The Do-While loop executes the same code again and again until a stop condition is met. The difference from while loop is that in do-while loop atleast the content of the loop is executed once before checking the condition. // Syntax int i=0; do{ code statements; i+=1; }while (i<10) // Example //for example if you want the user to enter the password again and again until they enter the correct password cout<<"\n\ndo-while loop\n"; i=1; string pass="pass", pass2; do{ if(i!=1){ cout<<"\naccess denied, try again"; } cout<<"\nenter your password?"; cin>>pass2; i=0; }while(pass2 != pass); cout<<"\npassword accepted\n\n"; C++ also provides the break and continue statements that allow us to alter the loops further. Following is their use: break jumps immidiately out of the loop. mostly used in while loops but can also be used in for loops // break statement example cout<<"\nbreak statement\n"; for(int f=1;f<11;f++){ if(f==5){ break; //we break out of the loop when f==5, and dont execute the loop for f>=5 } cout<<f<<", "; } continue is similar to break, but just breaks out of the current iteration, but still continues running the next iterations // continue statement example cout<<"\nbreak statement\n"; for(int f=1;f<11;f++){ if(f==5){ continue; } cout<<f<<", "; //this statement not executed for f==5 } 6. Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails C++ provides if.., if..else.., and switch statements to apply conditional logic. Lets take a look at them: a. The basic syntax for creating an if statement is: /////////// IF STATEMENT //////////// string pass="password",pass2; cout<<"\n\n--if statement capability--\n"; cout<<"\nenter password:"; cin>>pass2; if (pass==pass2){ cout<<"\npassword matches! you can enter!!"; } else{ cout<<"\npassword doesnt match! begone!!"; } b. The basic syntax for creating an if…else statement is: /////////// IF-ELSE STATEMENT //////////// int menuChoice=5; cout<<"\n\n--if-else statement capability--\n"; cout<<"\n1.\tadd record"; cout<<"\n2.\tdelete record"; cout<<"\n3.\texit"; cout<<"\nwhat do you want to do?"; cin>>menuChoice; if (menuChoice==1){ cout<<"\nlets add some records!!"; } else if (menuChoice==2){ cout<<"\nlets delete some records!!"; } else{ cout<<"\nexiting! good-bye!!"; } c. The basic syntax for creating a switch statement is: /////////// SWITCH STATEMENT //////////// int menuChoice2=5; cout<<"\n\n--switch statement capability--\n"; cout<<"\n1.\tadd record"; cout<<"\n2.\tdelete record"; cout<<"\n3.\texit"; cout<<"\nwhat do you want to do?"; cin>>menuChoice2; switch(menuChoice2){ case 1: cout<<"\nlets add some records!!"; break; case 2: cout<<"\nlets delete some records!!"; break; case 3: cout<<"\nexiting! good-bye!!"; break; default: cout<<"\n!!!!error!!!!"; } 7. Put your code in functions Functions allows us to create a block of code that can be executed many times without needing to it write it again. // Following is an example case where we define a function that shows a menu to the user int sub_menu(int choice) { switch(choice){ case 1: cout<<"\nLets add a new record"; break; case 2: cout<<"\nLets view an existing record"; break; case 3: cout<<"\nLets delete an existing record"; break; default: cout<<"\nExiting! Goodbye!!"; } return 0; } We can call the function by its name: // lets say we are writing the main() and we want to call the funciton // lines-of-code sub_menu() // lines-of-code 8. Advanced data types that are formed through a combinaiton of one or more types of basic data types such as structures or classes C++ allows user to create classes. These can be a combination of variables and functions that operate on those variables. Lets take a look at how we can define and use them. // Create a Car class with some attributes class Car { public: string brand; string model; int year; }; // Create an object of Car Car carObj1; carObj1.brand = "Mahindra"; carObj1.model = "Scorpio"; carObj1.year = 2020; // Using the object cout << carObj1.brand << " " << carObj1.model << " " << carObj1.year << "\n"; 9. Read file from a disk and save file to a disk Lets see how to read and write a text file in an organized way. We use the fstream header file for importing the functions necessary to read/write files. #include <fstream> // read a text file string line; ifstream myfile ("file.txt"); if (myfile.is_open()) { while ( getline (myfile,line) ) { cout << line << '\n'; } myfile.close(); } else cout << "Unable to open file"; // write a text file ofstream myfile ("file.txt"); if (myfile.is_open()) { myfile << "This is a line.\n"; myfile << "This is another line.\n"; myfile.close(); } else cout << "Unable to open file"; 10. Ability to comment your code so you can understand it when you revisit it some time later We can tell C++ that a line of code is a comment as follows. // this is a comment We can tell that a multi-line block of text as follows. /* this is a comment block */ While C++ can be a powerful tool, it can also be complex and difficult to learn, especially for beginners. The language has a steep learning curve, and requires a solid understanding of programming concepts such as pointers, memory management, and OOP. However, with the right resources and dedication, C++ can be a rewarding and powerful tool for software development. Overall, C++ is a popular and powerful programming language that is used in a wide range of applications, from operating systems to video games. Its efficiency, speed, and support for OOP and generic programming make it a versatile and powerful tool for software developers. To close I will emphasize the importance of practicing in learning anything new. Persistence and trying out different combinations of these building blocks for solving easier problems first and more complex ones later on is the only way to become fluent. Comments welcome!

Coding and Maths · 2019-02-02

Introduction to Programming in Microsoft Excel VBA

Quick Introduction Excel VBA, or Visual Basic for Applications, is a programming language that can be used to automate tasks and enhance functionality in Microsoft Excel. VBA is a powerful tool that allows users to write custom macros and functions to automate repetitive tasks, perform complex calculations, and create custom solutions. VBA is a type of Visual Basic, which is an object-oriented programming language developed by Microsoft. VBA is integrated directly into Excel, making it easy to access and use. VBA code is stored in modules, which can be accessed through the Visual Basic Editor in Excel. In the Editor, users can write, edit, and run VBA code, as well as debug their code to identify and fix any errors. One of the key advantages of VBA is that it allows users to automate repetitive tasks that would otherwise be time-consuming to perform manually. For example, users can write a VBA macro to format data, generate reports, or update data in bulk. VBA can also be used to perform complex calculations, create custom user interfaces, and interact with other applications. To get started with VBA, users should have a basic understanding of programming concepts and syntax. The VBA language is based on Visual Basic, so many programming concepts, such as variables, loops, and conditional statements, are similar to other programming languages. Excel also provides many built-in functions and objects that can be used in VBA code, making it easy to access and manipulate data in a spreadsheet. Most modern programming languages have a set up similar building blocks, for example Receiving input from the user and Showing output to the user Ability to store values in variables (usually of different kinds such as integers, floating points or character) A string of characters where you can store names, addresses, or any other kind of text Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails Put your code in functions Advanced data types that are formed through a combination of one or more types of basic data types such as structures or classes Read file from a disk and save file to a disk Ability to comment your code so you can understand it when you revisit it some time later Lets dive right in and see how we can do these things in VBA. 0. Enable VBA in your Excel file Before we can begin to write a program in VBA, also known as a macro, we need to enable the developer tab. You can do this by going to the File > Options > Customise ribbon. Once the developer tab is available, go there and choose the leftmost option which says Visual Basic. Now you will see a panel in the left where you can double clock on the sheet name you are working on. This will open a empty code window. Here write the following code and save the file as a macro enabled workbook (extension will be .xlsb). Sub simple_hello() Range("A2").Value = "Hello World!" End Sub Close the file, then opn it back again and chose the option (if shown) to enable macros. Now go to the Developer tab again and this time select the second option called Macros. Here you should see the macro that you just created. Select it and hit run! 1. Receiving input from the user and Showing output to the user There are several ways in which a macro can show output to the user. Let’s look at some ways of showing output: 'Method 1: Range("A2").Value = "Hello" 'Method 2: Worksheets("Sheet1").Range("B2").Value = "Hello" 'Method 3: Worksheets(1).Range("C2").Value = "Hello" 'Method 4: MsgBox "I added Hello in cell A2, B2 and C2" 'Method 5: MsgBox "Hello " & Range("C5").Value & vbNewLine & "So you are " & Range("C6") & " years old!" 2. Ability to store values in variables (usually of different kinds such as integers, floating points or character) VBA allows 4 key types of variables: Integer, String, Double and Boolean Integer is good for soring most numeric values, String is for character input and Boolean is for a 0/1 or yes/no type of data. Here are some examples: 'Integer: Dim x As Integer x = 6 Range("A1").Value = x 'String: Dim book As String book = "bible" Range("A1").Value = book 'Double: Dim x As Double x = 5.5 MsgBox "value is " & x 'Boolean: Dim continue As Boolean continue = True If continue = True Then MsgBox "Boolean variables are cool" 3. A string of characters where you can store names, addresses, or any other kind of text Key idea here is to learn how to manipulate string variables. There are a few common operations that we will focus on: a. Joining strings 'Join Strings Dim text1 As String, text2 As String text1 = "Hi" text2 = "Tim" MsgBox text1 & " " & text2 b. Left/right or middle functions - To extract the leftmost/rightmost or middle characters from a string. Dim text As String text = "example text" MsgBox Left(text, 4) 'Just as left, we can also extract a substing from the right or middle MsgBox Right("example text", 2) MsgBox Mid("example text", 9, 2) c. To get the length of a string, use Len. MsgBox Len("example text") d. To find the position of a substring in a string, use Instr. MsgBox InStr("example text", "am") 4. Some advance data types such as arrays which can store a series of regular variables (such as a series of integers) Array’s are a series of similar type of data stored together in one variable. Arrays can be one-dimentional or multi-dimentional. a. Following example shows how a one dimentional array works: Dim Films(1 To 5) As String Films(1) = "Lord of the Rings" Films(2) = "Speed" Films(3) = "Star Wars" Films(4) = "The Godfather" Films(5) = "Pulp Fiction" MsgBox Films(4) b. Following example shows how a two dimentional array works: Dim Films(1 To 5, 1 To 2) As String Dim i As Integer, j As Integer For i = 1 To 5 For j = 1 To 2 Films(i, j) = Cells(i, j).Value Next j Next i MsgBox Films(4, 2) 5. Ability to loop your code in the sense that you want to receive 10 names from a user, you will write the code for that 10 times, but just once and tell the computer to loop through it 10 times VBA has several looping options (for, do-while, do-until). There are options of nesting (single, double, triple, ..) loops. a. Following example shows how a simple/single for loop works: Dim i As Integer For i = 1 To 6 Cells(i, 1).Value = 100 Next i b. Following example shows how a double for loop works: Dim i As Integer, j As Integer For i = 1 To 6 For j = 1 To 2 Cells(i, j).Value = 100 Next j Next i c. Following example shows how a triple for loop works: Dim c As Integer, i As Integer, j As Integer For c = 1 To 3 For i = 1 To 6 For j = 1 To 2 Worksheets(c).Cells(i, j).Value = 100 Next j Next i Next c VBA also has a do-while loop. Following example shows how it works: Dim i As Integer i = 1 Do While i < 6 Cells(i, 1).Value = 20 i = i + 1 Loop VBA also has a do-until loop. Following example shows how it works: Dim i As Integer i = 1 Do Until i > 6 Cells(i, 1).Value = 20 i = i + 1 Loop 6. Ability to execute statements of code conditionally, for example if marks are more than 40 then the student passes else fails a. If Then Statement - VBA has the option of an if statement, which executes a piece of code only if a specified condition is met. Dim score As Integer, result As String score = Range("A1").Value If score >= 60 Then result = "pass" Range("B1").Value = result Dim score As Integer, result As String score = Range("A1").Value b. If Else Statement - VBA has the option of an if-else statement, which executes a piece of code only if a specified condition is met, if not then it executes another piece of code. If score >= 60 Then result = "pass" Else result = "fail" End If Range("B1").Value = result c. If Else Statement - VBA has the option of an if-else statement, which executes a piece of code only if a specified condition is met, if not then it executes another piece of code. 'Select Case 'First, declare two variables. One variable of type Integer named score and one variable of type String named result Dim score As Integer, result As String 'We initialize the variable score with the value of cell A1 score = Range("A1").Value 'Add the Select Case structure Select Case score Case Is >= 80 result = "very good" Case Is >= 70 result = "good" Case Is >= 60 result = "sufficient" Case Else result = "insufficient" End Select 'Write the value of the variable result to cell B1 Range("B1").Value = result 7. Put your code in functions VBA allows us to specify a function or a sub. The difference between the two is that funciton allows us to return a variable whereas a sub does not. a. Function - If you want Excel VBA to perform a task that returns a result, you can use a function. Place a function into a module (In the Visual Basic Editor, click Insert, Module). For example, the function with name Area. 'Explanation: This function has two arguments (of type Double) and a return type (the part after As also of type Double). You can use the name of the function (Area) in your code to indicate which result you want to return (here x * y). Function Area(x As Double, y As Double) As Double Area = x * y End Function 'Explanation: The function returns a value so you have to 'catch' this value in your code. You can use another variable (z) for this. Next, you can add another value to this variable (if you want). Finally, display the value using a MsgBox. Dim z As Double z = Area(3, 5) + 2 MsgBox z b. Sub - If you want Excel VBA to perform some actions, you can use a sub. Place a sub into a module (In the Visual Basic Editor, click Insert, Module). For example, the sub with name Area. Sub Area(x As Double, y As Double) MsgBox x * y End Sub 'Explanation: This sub has two arguments (of type Double). It does not have a return type! You can refer to this sub (call the sub) from somewhere else in your code by simply using the name of the sub and giving a value for each argument. 'Call it using Area 3, 5 8. Advanced data types that are formed through a combinaiton of one or more types of basic data types such as structures or classes VBA Class allows us to create our own Object function in which we can add any kind of features, details of the command line, type of function. When we create Class in VBA, they act like totally an independent object function but they all are connected together. Detailed example of how to do this is out of the scope of this article. 9. Out of scope of this article. 10. Ability to comment your code so you can understand it when you revisit it some time later We can tell VBA that a line of code is a comment by starting it with an single inverted comma. 'this is a comment Overall, Excel VBA is a powerful tool that can help users automate tasks, improve productivity, and enhance the functionality of Microsoft Excel. With its flexibility and ease of use, VBA is a valuable tool for users of all skill levels, from beginners to advanced programmers. To close I will emphasize the importance of practicing in learning anything new. Persistence and trying out different combinations of these building blocks for solving easier problems first and more complex ones later on is the only way to become fluent. Comments welcome!

Coding and Maths · 2019-01-05

parashar.ca

Contact

Coding and Maths